mteb/leaderboard · Showcase official/verified results

nouamanetazi

Massive Text Embedding Benchmark org Oct 1, 2022

•

edited Oct 1, 2022

We can probably mark/colorize the official/verified results, and note their count as well like this:

Total Models: 30 (24 official + 6 self-reported)

Wdyt @Muennighoff ?

Muennighoff

Massive Text Embedding Benchmark org Oct 2, 2022

I think this is a good idea, but would wait for a few actual non-official results to come in

deleted

19 days ago

This comment has been hidden

KennethEnevoldsen

Massive Text Embedding Benchmark org 18 days ago

What does official mean in this context?

All new submissions (we receive at least a couple a week) typically perform the following:

add an implementation to mteb(typically they will just use the wrapper for sentence transformers so they just have to supply the metadata)
Using the implementation, run the results (though they could change the implementation afterwards or change the results)
Submit the results to embedding-benchmark/results, where we do a review, which checks for outliers (we have discovered a few cases where providers forgot to tell us that they trained on one of the datasets)

We do not track who evaluated the model.

We have a few historic data points that are completely self-reported without validation (submitted by pushing results to model card), this submission process is no longer possible.

We could add a symbol for "Reproducible"

Muennighoff

Massive Text Embedding Benchmark org 18 days ago

I think since implementations are now required it is probably fine to close this! I guess that things that turn out to be non-reproducible might be flagged by users and then removed unless fixes anyways

KennethEnevoldsen

Massive Text Embedding Benchmark org 17 days ago

Yeah, If people find stuff that doesn't reproduce, we either rerun it or remove it (after giving the authors the chance to fix it).

KennethEnevoldsen changed discussion status to closed 17 days ago