11 10 37

Rishiraj Acharya

rishiraj

https://rishiraj.github.io/

AI & ML interests

None yet

Recent Activity

updated a dataset 5 days ago

rishiraj/I_O_2025_Q_A

published a dataset 5 days ago

rishiraj/I_O_2025_Q_A

commented on an article 11 days ago

SmolLM3: smol, multilingual, long-context reasoner

View all activity

Organizations

updated a dataset 5 days ago

rishiraj/I_O_2025_Q_A

Viewer • Updated 5 days ago • 30 • 47

published a dataset 5 days ago

rishiraj/I_O_2025_Q_A

Viewer • Updated 5 days ago • 30 • 47

commented on SmolLM3: smol, multilingual, long-context reasoner 11 days ago

I really like reading such detailed model reports that discuss these technical things like architecture, data & exact parameters that are so useful for ML Engineers. Loved the report even more than the model (which is good as well lol).

commented on Gemma 3n fully available in the open-source ecosystem! 23 days ago

That was a great read, for anyone wanting a deeper understanding of the architecture, read https://huggingface.co/blog/rishiraj/matformer-in-gemma-3n

upvoted an article 23 days ago

Article

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

•

23 days ago

• 33

published an article 23 days ago

Article

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

•

23 days ago

• 33

commented on Why Maybe We're Measuring LLM Compression Wrong 28 days ago

Thank you for the kind words, really glad you enjoyed the post!

We usually compute the forward KL:
KL(P‖Q), where P is the original (full-precision) model's output distribution and Q is the quantized model's.

This forward direction does emphasize mode covering — it penalizes cases where the quantized model assigns too little probability to tokens that the original model thought were likely. So yes, it's more forgiving if Q spreads its probabilities out, but punishes it when it "misses" key peaks from P.

So to your analogy: yes, when I talk about preserving the original model’s behavior, I am leaning into this “mode-covering” perspective. The idea is that we want the quantized model to remain attentive to the same likely outputs as the original. If it diverges too far — say, by ignoring a high-confidence token the full-precision model preferred — that’s where forward KL catches it (and where flips are more likely to show up).

One caveat: reverse KL (KL(Q‖P)) would behave very differently — it penalizes overconfidence in the quantized model that the original didn’t share, which could be useful too in some contexts, but it’s generally less stable in practice when P assigns low probabilities.

upvoted an article 28 days ago

Article

Why Maybe We're Measuring LLM Compression Wrong

•

28 days ago

• 9

published an article 28 days ago

Article

Why Maybe We're Measuring LLM Compression Wrong

•

28 days ago

• 9

updated a Space about 1 month ago

Embedding

🐨

Compute similarity between queries and documents

published a Space about 1 month ago

Embedding

🐨

Compute similarity between queries and documents

updated a Space about 1 month ago

Translate

🏆

Translate text into multiple Indian languages

published a Space about 1 month ago

Translate

🏆

Translate text into multiple Indian languages

New activity in Souvik3333/Nanonets-ocr-s about 1 month ago

Bug fix in app.py

#1 opened about 1 month ago by

rishiraj

liked a Space about 1 month ago

Medgemma 27b Text It

😻

Generate medically-informed responses using prompts

updated a Space about 2 months ago

Medgemma 27b Text It

😻

Generate medically-informed responses using prompts

published a Space about 2 months ago

Medgemma 27b Text It

😻

Generate medically-informed responses using prompts

updated a Space about 2 months ago

Radiology

🩻

Google I/O 25: Radiology with MedGemma, Gemini Native TTS

liked a Space about 2 months ago

Radiology

🩻

Google I/O 25: Radiology with MedGemma, Gemini Native TTS

published a Space about 2 months ago

Radiology

🩻

Google I/O 25: Radiology with MedGemma, Gemini Native TTS

Rishiraj Acharya

AI & ML interests

Recent Activity

Organizations

rishiraj's activity

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

Understanding Gemma 3n: How MatFormer Gives You Many Models in One

Why Maybe We're Measuring LLM Compression Wrong

Why Maybe We're Measuring LLM Compression Wrong

Embedding

Embedding

Translate

Translate

Bug fix in app.py

Medgemma 27b Text It

Medgemma 27b Text It

Medgemma 27b Text It

Radiology

Radiology

Radiology