Massive Text Embedding Benchmark

non-profit

https://github.com/embeddings-benchmark

embeddings-benchmark

Activity Feed

AI & ML interests

Massive Text Embeddings Benchmark

Recent Activity

Samoed updated a dataset 24 minutes ago

mteb/XNLIV2

Samoed updated a dataset 29 minutes ago

mteb/AlloprofReranking

gowitheflow authored a paper 2 days ago

MMTEB: Massive Multilingual Text Embedding Benchmark

View all activity

mteb's activity

Samoed

updated a dataset 24 minutes ago

mteb/XNLIV2

Viewer • Updated 24 minutes ago • 13.7k • 12

Samoed

updated a dataset 29 minutes ago

mteb/AlloprofReranking

Preview • Updated 29 minutes ago • 68

kardosdrur

updated a Space about 18 hours ago

5.42k

MTEB Leaderboard

🥇

Embedding Leaderboard

hepengfe

updated a dataset 1 day ago

mteb/common_voice_20_0

Viewer • Updated 1 day ago • 1 • 34

orionweller

updated a dataset 2 days ago

mteb/results

Updated 2 days ago • 1.28k • 1

KennethEnevoldsen

updated a collection 2 days ago

MTEB Papers

Collection

This is a collection of MTEB papers (not exhaustive). • 7 items • Updated 2 days ago • 1

gowitheflow

authored a paper 2 days ago

MMTEB: Massive Multilingual Text Embedding Benchmark

Paper • 2502.13595 • Published Feb 19 • 34

isaacchung

authored a paper 2 days ago

MIEB: Massive Image Embedding Benchmark

Paper • 2504.10471 • Published 4 days ago • 14

Samoed

authored a paper 2 days ago

MIEB: Massive Image Embedding Benchmark

Paper • 2504.10471 • Published 4 days ago • 14

KennethEnevoldsen

authored a paper 2 days ago

MIEB: Massive Image Embedding Benchmark

Paper • 2504.10471 • Published 4 days ago • 14

gowitheflow

authored a paper 2 days ago

MIEB: Massive Image Embedding Benchmark

Paper • 2504.10471 • Published 4 days ago • 14

Muennighoff

updated a dataset 2 days ago

mteb/assets

Viewer • Updated 2 days ago • 8 • 12

Muennighoff

published a dataset 2 days ago

mteb/assets

Viewer • Updated 2 days ago • 8 • 12

Muennighoff

updated a Space 3 days ago

5.42k

MTEB Leaderboard

🥇

Embedding Leaderboard

tomaarsen

posted an update 3 days ago

Post

2406

I just released Sentence Transformers v4.1; featuring ONNX and OpenVINO backends for rerankers offering 2-3x speedups and improved hard negatives mining which helps prepare stronger training datasets. Details:

🏎️ ONNX, OpenVINO, Optimization, Quantization
- I've added ONNX and OpenVINO support with just one extra argument: "backend" when loading the CrossEncoder reranker, e.g.: CrossEncoder("cross-encoder/ms-marco-MiniLM-L6-v2", backend="onnx")
- The export_optimized_onnx_model, export_dynamic_quantized_onnx_model, and export_static_quantized_openvino_model functions now work with CrossEncoder rerankers, allowing you to optimize (e.g. fusions, gelu approximations, etc.) or quantize (int8 weights) rerankers.
- I've uploaded ~340 ONNX & OpenVINO models for all existing models under the cross-encoder Hugging Face organization. You can use these without having to export when loading.

⛏ Improved Hard Negatives Mining
- Added 'absolute_margin' and 'relative_margin' arguments to mine_hard_negatives.
- absolute_margin ensures that sim(query, negative) < sim(query, positive) - absolute_margin, i.e. an absolute margin between the negative & positive similarities.
- relative_margin ensures that sim(query, negative) < sim(query, positive) * (1 - relative_margin), i.e. a relative margin between the negative & positive similarities.
- Inspired by the excellent NV-Retriever paper from NVIDIA.

And several other small improvements. Check out the full release notes here: https://github.com/UKPLab/sentence-transformers/releases/tag/v4.1.0

With this release, I introduce near-feature parity between the SentenceTransformer embedding & CrossEncoder reranker models, which I've wanted to do for quite some time! With rerankers very strongly supported now, it's time to look forward to other useful architectures!

Samoed

in mteb/leaderboard_legacy 3 days ago

leaderboard_legacy error

#2 opened 3 days ago by

yaoone

vaibhavad

authored a paper 7 days ago

DeepSeek-R1 Thoughtology: Let's <think> about LLM Reasoning

Paper • 2504.07128 • Published 16 days ago • 79

nouamanetazi

authored a paper 10 days ago

SmolVLM: Redefining small and efficient multimodal models

Paper • 2504.05299 • Published 11 days ago • 161

Muennighoff

updated a Space 13 days ago

106

MTEB Arena

⚔

Teach, test, evaluate language models with MTEB Arena

mmhamdy

posted an update 19 days ago

Post

1569

What inspired the Transformer architecture in the "Attention Is All You Need" paper? And how were various ideas combined to create this groundbreaking model?

In this lengthy article, I explore the story and the origins of some of the ideas introduced in the paper. We'll explore everything from the fundamental attention mechanism that lies at its heart to the surprisingly simple explanation for its name, Transformer.

💡 Examples of ideas explored in the article:

✅ What was the inspiration for the attention mechanism?
✅ How did we go from attention to self-attention?
✅ Did the team have any other names in mind for the model?

and more...

I aim to tell the story of Transformers as I would have wanted to read it, and hopefully, one that appeals to others interested in the details of this fascinating idea. This narrative draws from video interviews, lectures, articles, tweets/Xs, and some digging into the literature. I have done my best to be accurate, but errors are possible. If you find inaccuracies or have any additions, please do reach out, and I will gladly make the necessary updates.

Read the article: https://huggingface.co/blog/mmhamdy/pandemonium-the-transformers-story

AI & ML interests

Recent Activity

Team members 40

mteb's activity

MTEB Leaderboard

MTEB Leaderboard

leaderboard_legacy error

MTEB Arena