ONNX + GGUF version?
Hello,
First, thanks for your work!
I'd like to use this embedding model through transformers.js
, any way to get this model in ONNX format?
Hi! While it should be possible to convert the model to onnx, I wouldn't recommend it as this would require disabling flash attention which goes a long way into making this model efficient.
See here for details :
https://github.com/AnswerDotAI/ModernBERT/issues/173#issuecomment-2667400927
Thanks, I’ll wait for the issue to be closed.
Hello,
I'm using the same conversation for asking about gguf
format (to consume this model with llama.cpp
).
Any way to get a gguf
format or it's also not recommended?
Thanks again.
I agree a GGUF version would be valuable. As far as I could see, there are currently no GGUF implementation of ModernBERT. I would not be able to contribute one in the near future, but I assume you're not the only one that would be interested in one! I would +1 this request on the ModernBERT and GGUF repositories, and hope this eventually gets worked on. This is the downside of working based on the model of a smaller lab like Answer.AI, they have limited bandwidth to provide all the adjacent features such as ONNX and GGUF conversions.