File size: 3,900 Bytes

---
license: mit
datasets:
- mteb/scifact
language:
- en
pipeline_tag: text-retrieval
library_name: sentence-transformers
tags:
- mteb
- text
- transformers
- text-embeddings-inference
- CSR
model-index:
- name: CSR
  results:
    - dataset:
        name: MTEB SciFact
        type: mteb/scifact
        revision: 0228b52cf27578f30900b9e5271d331663a030d7
        config: default
        split: test
        languages:
          - eng-Latn
      metrics:
        - type: ndcg@1
          value: 0.59333
        - type: ndcg@3
          value: 0.65703
        - type: ndcg@5
          value: 0.67072
        - type: ndcg@10
          value: 0.68412
        - type: ndcg@20
          value: 0.69238
        - type: ndcg@100
          value: 0.70514
        - type: ndcg@1000
          value: 0.71517
        - type: map@1
          value: 0.5675
        - type: map@3
          value: 0.63602
        - type: map@5
          value: 0.64712
        - type: map@10
          value: 0.65301
        - type: map@20
          value: 0.65552
        - type: map@100
          value: 0.65778
        - type: map@1000
          value: 0.65815
        - type: recall@1
          value: 0.5675
        - type: recall@3
          value: 0.69772
        - type: recall@5
          value: 0.73367
        - type: recall@10
          value: 0.77333
        - type: recall@20
          value: 0.80367
        - type: recall@100
          value: 0.86667
        - type: recall@1000
          value: 0.945
        - type: precision@1
          value: 0.59333
        - type: precision@3
          value: 0.25667
        - type: precision@5
          value: 0.164
        - type: precision@10
          value: 0.08667
        - type: precision@20
          value: 0.04533
        - type: precision@100
          value: 0.0099
        - type: precision@1000
          value: 0.00107
        - type: mrr@1
          value: 0.59333
        - type: mrr@3
          value: 0.64667
        - type: mrr@5
          value: 0.65333
        - type: mrr@10
          value: 0.65883
        - type: mrr@20
          value: 0.66105
        - type: mrr@100
          value: 0.66254
        - type: mrr@1000
          value: 0.66292
        - type: main_score
          value: 0.68412
      task:
        type: Retrieval
---

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [Github](https://github.com/neilwen987/CSR_Adaptive_Rep).


## Usage
📌 **Tip**: For NV-Embed-V2, using Transformers versions **later** than 4.47.0 may lead to performance degradation, as ``model_type=bidir_mistral`` in ``config.json``  is no longer supported.

We recommend using ``Transformers 4.47.0.``

### Sentence Transformers Usage
You can evaluate this model loaded by Sentence Transformers with the following code snippet:
```python
import mteb
from sentence_transformers import SparseEncoder
model = SparseEncoder(
    "Y-Research-Group/CSR-NV_Embed_v2-Retrieval-SciFACT ",
    trust_remote_code=True
)
model.prompts = {
    "SciFact-query": "Instrcut: Given a scientific claim, retrieve documents that support or refute the claim\nQuery:"
}
task = mteb.get_tasks(tasks=["SciFact"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(
    model,
    eval_splits=["test"],
    output_folder="./results/SciFact",
    show_progress_bar=True
    encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8},
)  # MTEB don't support sparse tensors yet, so we need to convert to dense tensors
```

## Citation
```bibtex
@inproceedings{wenbeyond,
  title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
  author={Wen, Tiansheng and Wang, Yifei and Zeng, Zequn and Peng, Zhong and Su, Yudi and Liu, Xinyang and Chen, Bo and Liu, Hongwei and Jegelka, Stefanie and You, Chenyu},
  booktitle={Forty-second International Conference on Machine Learning}
}
```