Showcase official/verified results

#2
by nouamanetazi - opened
Massive Text Embedding Benchmark org
edited Oct 1, 2022

We can probably mark/colorize the official/verified results, and note their count as well like this:

  • Total Models: 30 (24 official + 6 self-reported)

Wdyt @Muennighoff ?

Massive Text Embedding Benchmark org

I think this is a good idea, but would wait for a few actual non-official results to come in

deleted
This comment has been hidden
Massive Text Embedding Benchmark org

What does official mean in this context?

All new submissions (we receive at least a couple a week) typically perform the following:

  • add an implementation to mteb(typically they will just use the wrapper for sentence transformers so they just have to supply the metadata)
  • Using the implementation, run the results (though they could change the implementation afterwards or change the results)
  • Submit the results to embedding-benchmark/results, where we do a review, which checks for outliers (we have discovered a few cases where providers forgot to tell us that they trained on one of the datasets)

We do not track who evaluated the model.

We have a few historic data points that are completely self-reported without validation (submitted by pushing results to model card), this submission process is no longer possible.

We could add a symbol for "Reproducible"

Massive Text Embedding Benchmark org

I think since implementations are now required it is probably fine to close this! I guess that things that turn out to be non-reproducible might be flagged by users and then removed unless fixes anyways

Massive Text Embedding Benchmark org

Yeah, If people find stuff that doesn't reproduce, we either rerun it or remove it (after giving the authors the chance to fix it).

KennethEnevoldsen changed discussion status to closed

Sign up or log in to comment