|
--- |
|
license: mit |
|
datasets: |
|
- mteb/scifact |
|
language: |
|
- en |
|
pipeline_tag: text-retrieval |
|
library_name: sentence-transformers |
|
tags: |
|
- mteb |
|
- text |
|
- transformers |
|
- text-embeddings-inference |
|
- CSR |
|
model-index: |
|
- name: CSR |
|
results: |
|
- dataset: |
|
name: MTEB SciFact |
|
type: mteb/scifact |
|
revision: 0228b52cf27578f30900b9e5271d331663a030d7 |
|
config: default |
|
split: test |
|
languages: |
|
- eng-Latn |
|
metrics: |
|
- type: ndcg@1 |
|
value: 0.59333 |
|
- type: ndcg@3 |
|
value: 0.65703 |
|
- type: ndcg@5 |
|
value: 0.67072 |
|
- type: ndcg@10 |
|
value: 0.68412 |
|
- type: ndcg@20 |
|
value: 0.69238 |
|
- type: ndcg@100 |
|
value: 0.70514 |
|
- type: ndcg@1000 |
|
value: 0.71517 |
|
- type: map@1 |
|
value: 0.5675 |
|
- type: map@3 |
|
value: 0.63602 |
|
- type: map@5 |
|
value: 0.64712 |
|
- type: map@10 |
|
value: 0.65301 |
|
- type: map@20 |
|
value: 0.65552 |
|
- type: map@100 |
|
value: 0.65778 |
|
- type: map@1000 |
|
value: 0.65815 |
|
- type: recall@1 |
|
value: 0.5675 |
|
- type: recall@3 |
|
value: 0.69772 |
|
- type: recall@5 |
|
value: 0.73367 |
|
- type: recall@10 |
|
value: 0.77333 |
|
- type: recall@20 |
|
value: 0.80367 |
|
- type: recall@100 |
|
value: 0.86667 |
|
- type: recall@1000 |
|
value: 0.945 |
|
- type: precision@1 |
|
value: 0.59333 |
|
- type: precision@3 |
|
value: 0.25667 |
|
- type: precision@5 |
|
value: 0.164 |
|
- type: precision@10 |
|
value: 0.08667 |
|
- type: precision@20 |
|
value: 0.04533 |
|
- type: precision@100 |
|
value: 0.0099 |
|
- type: precision@1000 |
|
value: 0.00107 |
|
- type: mrr@1 |
|
value: 0.59333 |
|
- type: mrr@3 |
|
value: 0.64667 |
|
- type: mrr@5 |
|
value: 0.65333 |
|
- type: mrr@10 |
|
value: 0.65883 |
|
- type: mrr@20 |
|
value: 0.66105 |
|
- type: mrr@100 |
|
value: 0.66254 |
|
- type: mrr@1000 |
|
value: 0.66292 |
|
- type: main_score |
|
value: 0.68412 |
|
task: |
|
type: Retrieval |
|
--- |
|
|
|
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [Github](https://github.com/neilwen987/CSR_Adaptive_Rep). |
|
|
|
|
|
## Usage |
|
📌 **Tip**: For NV-Embed-V2, using Transformers versions **later** than 4.47.0 may lead to performance degradation, as ``model_type=bidir_mistral`` in ``config.json`` is no longer supported. |
|
|
|
We recommend using ``Transformers 4.47.0.`` |
|
|
|
### Sentence Transformers Usage |
|
You can evaluate this model loaded by Sentence Transformers with the following code snippet: |
|
```python |
|
import mteb |
|
from sentence_transformers import SparseEncoder |
|
model = SparseEncoder( |
|
"Y-Research-Group/CSR-NV_Embed_v2-Retrieval-SciFACT ", |
|
trust_remote_code=True |
|
) |
|
model.prompts = { |
|
"SciFact-query": "Instrcut: Given a scientific claim, retrieve documents that support or refute the claim\nQuery:" |
|
} |
|
task = mteb.get_tasks(tasks=["SciFact"]) |
|
evaluation = mteb.MTEB(tasks=task) |
|
evaluation.run( |
|
model, |
|
eval_splits=["test"], |
|
output_folder="./results/SciFact", |
|
show_progress_bar=True |
|
encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8}, |
|
) # MTEB don't support sparse tensors yet, so we need to convert to dense tensors |
|
``` |
|
|
|
## Citation |
|
```bibtex |
|
@inproceedings{wenbeyond, |
|
title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation}, |
|
author={Wen, Tiansheng and Wang, Yifei and Zeng, Zequn and Peng, Zhong and Su, Yudi and Liu, Xinyang and Chen, Bo and Liu, Hongwei and Jegelka, Stefanie and You, Chenyu}, |
|
booktitle={Forty-second International Conference on Machine Learning} |
|
} |
|
``` |