File size: 3,900 Bytes
d9b770f c11dca1 d9b770f c11dca1 d9b770f 29f3c03 d9b770f c11dca1 d9b770f c11dca1 d9b770f c11dca1 23772f4 d9b770f c11dca1 d9b770f c11dca1 d9b770f c11dca1 d9b770f c11dca1 d9b770f c11dca1 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 |
---
license: mit
datasets:
- mteb/scifact
language:
- en
pipeline_tag: text-retrieval
library_name: sentence-transformers
tags:
- mteb
- text
- transformers
- text-embeddings-inference
- CSR
model-index:
- name: CSR
results:
- dataset:
name: MTEB SciFact
type: mteb/scifact
revision: 0228b52cf27578f30900b9e5271d331663a030d7
config: default
split: test
languages:
- eng-Latn
metrics:
- type: ndcg@1
value: 0.59333
- type: ndcg@3
value: 0.65703
- type: ndcg@5
value: 0.67072
- type: ndcg@10
value: 0.68412
- type: ndcg@20
value: 0.69238
- type: ndcg@100
value: 0.70514
- type: ndcg@1000
value: 0.71517
- type: map@1
value: 0.5675
- type: map@3
value: 0.63602
- type: map@5
value: 0.64712
- type: map@10
value: 0.65301
- type: map@20
value: 0.65552
- type: map@100
value: 0.65778
- type: map@1000
value: 0.65815
- type: recall@1
value: 0.5675
- type: recall@3
value: 0.69772
- type: recall@5
value: 0.73367
- type: recall@10
value: 0.77333
- type: recall@20
value: 0.80367
- type: recall@100
value: 0.86667
- type: recall@1000
value: 0.945
- type: precision@1
value: 0.59333
- type: precision@3
value: 0.25667
- type: precision@5
value: 0.164
- type: precision@10
value: 0.08667
- type: precision@20
value: 0.04533
- type: precision@100
value: 0.0099
- type: precision@1000
value: 0.00107
- type: mrr@1
value: 0.59333
- type: mrr@3
value: 0.64667
- type: mrr@5
value: 0.65333
- type: mrr@10
value: 0.65883
- type: mrr@20
value: 0.66105
- type: mrr@100
value: 0.66254
- type: mrr@1000
value: 0.66292
- type: main_score
value: 0.68412
task:
type: Retrieval
---
For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [Github](https://github.com/neilwen987/CSR_Adaptive_Rep).
## Usage
📌 **Tip**: For NV-Embed-V2, using Transformers versions **later** than 4.47.0 may lead to performance degradation, as ``model_type=bidir_mistral`` in ``config.json`` is no longer supported.
We recommend using ``Transformers 4.47.0.``
### Sentence Transformers Usage
You can evaluate this model loaded by Sentence Transformers with the following code snippet:
```python
import mteb
from sentence_transformers import SparseEncoder
model = SparseEncoder(
"Y-Research-Group/CSR-NV_Embed_v2-Retrieval-SciFACT ",
trust_remote_code=True
)
model.prompts = {
"SciFact-query": "Instrcut: Given a scientific claim, retrieve documents that support or refute the claim\nQuery:"
}
task = mteb.get_tasks(tasks=["SciFact"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(
model,
eval_splits=["test"],
output_folder="./results/SciFact",
show_progress_bar=True
encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8},
) # MTEB don't support sparse tensors yet, so we need to convert to dense tensors
```
## Citation
```bibtex
@inproceedings{wenbeyond,
title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
author={Wen, Tiansheng and Wang, Yifei and Zeng, Zequn and Peng, Zhong and Su, Yudi and Liu, Xinyang and Chen, Bo and Liu, Hongwei and Jegelka, Stefanie and You, Chenyu},
booktitle={Forty-second International Conference on Machine Learning}
}
``` |