2 2 2

Rahim Khan

rahim-xelpmoc

AI & ML interests

None yet

Recent Activity

upvoted an article 6 days ago

How to generate text: using different decoding methods for language generation with Transformers

new activity 13 days ago

jamesliu1217/EasyControl_Ghibli:share the training code

liked a Space 21 days ago

jamesliu1217/EasyControl_Ghibli

View all activity

Organizations

rahim-xelpmoc's activity

upvoted an article 6 days ago

Article

How to generate text: using different decoding methods for language generation with Transformers

Mar 1, 2020

• 193

New activity in jamesliu1217/EasyControl_Ghibli 13 days ago

share the training code

#4 opened 21 days ago by

rahim-xelpmoc

liked a Space 21 days ago

1.32k

EasyControl Ghibli

🦀

New Ghibli EasyControl model is now released!!

updated a Space 28 days ago

Mlflow

👀

mlflow -server

published a Space 28 days ago

Mlflow

👀

mlflow -server

New activity in ds4sd/SmolDocling-256M-preview about 1 month ago

hallucinating a lot

#30 opened about 1 month ago by

rahim-xelpmoc

updated a model about 2 months ago

rahim-xelpmoc/siglip2-base-patch16-384

Zero-Shot Image Classification • Updated Feb 25 • 2

published a model about 2 months ago

rahim-xelpmoc/siglip2-base-patch16-384

Zero-Shot Image Classification • Updated Feb 25 • 2

reacted to AdinaY's post with 😎 2 months ago

Post

2458

The latest paper of DeepSeek is now available on the Daily Papers page 🚀
You can reach out to the authors directly on this page👇
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention (2502.11089)

1 reply

liked a model 2 months ago

microsoft/layoutlmv3-base

Updated Apr 10, 2024 • 1.62M • 384

reacted to nicolay-r's post with 🔥 3 months ago

Post

1623

📢 The LLaMA-3.1-8B distilled 8B version of the R1 DeepSeek AI is available besides the one based on Qwen

📙 Notebook for using it in reasoning over series of data 🧠 :
https://github.com/nicolay-r/nlp-thirdgate/blob/master/tutorials/llm_deep_seek_7b_distill_llama3.ipynb

Loading using the pipeline API of the transformers library:
https://github.com/nicolay-r/nlp-thirdgate/blob/master/llm/transformers_llama.py
🟡 GPU Usage: 12.3 GB (FP16/FP32 mode) which is suitable for T4. (a 1.5 GB less than Qwen-distilled version)
🐌 Perfomance: T4 instance: ~0.19 tokens/sec (FP32 mode) and (FP16 mode) ~0.22-0.30 tokens/sec. Is it should be that slow? 🤔
Model name: deepseek-ai/DeepSeek-R1-Distill-Llama-8B
⭐ Framework: https://github.com/nicolay-r/bulk-chain
🌌 Notebooks and models hub: https://github.com/nicolay-r/nlp-thirdgate