ONNX + GGUF version?

by charnould - opened Mar 23

Mar 23

Hello,
First, thanks for your work!
I'd like to use this embedding model through transformers.js, any way to get this model in ONNX format?

FremyCompany

Parallia org Mar 23

Hi! While it should be possible to convert the model to onnx, I wouldn't recommend it as this would require disabling flash attention which goes a long way into making this model efficient.

See here for details :
https://github.com/AnswerDotAI/ModernBERT/issues/173#issuecomment-2667400927

charnould

29 days ago

Thanks, I’ll wait for the issue to be closed.

charnould changed discussion status to closed 29 days ago

charnould

19 days ago

Hello,
I'm using the same conversation for asking about gguf format (to consume this model with llama.cpp).
Any way to get a gguf format or it's also not recommended?
Thanks again.

charnould changed discussion status to open 19 days ago

charnould changed discussion title from ONNX version? to ONNX + GGUF version? 19 days ago

FremyCompany

Parallia org 17 days ago

I agree a GGUF version would be valuable. As far as I could see, there are currently no GGUF implementation of ModernBERT. I would not be able to contribute one in the near future, but I assume you're not the only one that would be interested in one! I would +1 this request on the ModernBERT and GGUF repositories, and hope this eventually gets worked on. This is the downside of working based on the model of a smaller lab like Answer.AI, they have limited bandwidth to provide all the adjacent features such as ONNX and GGUF conversions.

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

Your need to confirm your account before you can post a new comment.

· Sign up or log in to comment