ONNX + GGUF version?

#2
by charnould - opened

Hello,
First, thanks for your work!
I'd like to use this embedding model through transformers.js, any way to get this model in ONNX format?

Parallia org

Hi! While it should be possible to convert the model to onnx, I wouldn't recommend it as this would require disabling flash attention which goes a long way into making this model efficient.

See here for details :
https://github.com/AnswerDotAI/ModernBERT/issues/173#issuecomment-2667400927

Thanks, I’ll wait for the issue to be closed.

charnould changed discussion status to closed

Hello,
I'm using the same conversation for asking about gguf format (to consume this model with llama.cpp).
Any way to get a gguf format or it's also not recommended?
Thanks again.

charnould changed discussion status to open
charnould changed discussion title from ONNX version? to ONNX + GGUF version?
Parallia org

I agree a GGUF version would be valuable. As far as I could see, there are currently no GGUF implementation of ModernBERT. I would not be able to contribute one in the near future, but I assume you're not the only one that would be interested in one! I would +1 this request on the ModernBERT and GGUF repositories, and hope this eventually gets worked on. This is the downside of working based on the model of a smaller lab like Answer.AI, they have limited bandwidth to provide all the adjacent features such as ONNX and GGUF conversions.

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment