File size: 3,900 Bytes
d9b770f
c11dca1
 
 
 
 
 
 
d9b770f
 
c11dca1
 
 
 
d9b770f
29f3c03
d9b770f
c11dca1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d9b770f
 
c11dca1
d9b770f
 
c11dca1
23772f4
d9b770f
c11dca1
d9b770f
c11dca1
 
d9b770f
c11dca1
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
d9b770f
 
 
 
c11dca1
 
 
 
d9b770f
c11dca1
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
---
license: mit
datasets:
- mteb/scifact
language:
- en
pipeline_tag: text-retrieval
library_name: sentence-transformers
tags:
- mteb
- text
- transformers
- text-embeddings-inference
- CSR
model-index:
- name: CSR
  results:
    - dataset:
        name: MTEB SciFact
        type: mteb/scifact
        revision: 0228b52cf27578f30900b9e5271d331663a030d7
        config: default
        split: test
        languages:
          - eng-Latn
      metrics:
        - type: ndcg@1
          value: 0.59333
        - type: ndcg@3
          value: 0.65703
        - type: ndcg@5
          value: 0.67072
        - type: ndcg@10
          value: 0.68412
        - type: ndcg@20
          value: 0.69238
        - type: ndcg@100
          value: 0.70514
        - type: ndcg@1000
          value: 0.71517
        - type: map@1
          value: 0.5675
        - type: map@3
          value: 0.63602
        - type: map@5
          value: 0.64712
        - type: map@10
          value: 0.65301
        - type: map@20
          value: 0.65552
        - type: map@100
          value: 0.65778
        - type: map@1000
          value: 0.65815
        - type: recall@1
          value: 0.5675
        - type: recall@3
          value: 0.69772
        - type: recall@5
          value: 0.73367
        - type: recall@10
          value: 0.77333
        - type: recall@20
          value: 0.80367
        - type: recall@100
          value: 0.86667
        - type: recall@1000
          value: 0.945
        - type: precision@1
          value: 0.59333
        - type: precision@3
          value: 0.25667
        - type: precision@5
          value: 0.164
        - type: precision@10
          value: 0.08667
        - type: precision@20
          value: 0.04533
        - type: precision@100
          value: 0.0099
        - type: precision@1000
          value: 0.00107
        - type: mrr@1
          value: 0.59333
        - type: mrr@3
          value: 0.64667
        - type: mrr@5
          value: 0.65333
        - type: mrr@10
          value: 0.65883
        - type: mrr@20
          value: 0.66105
        - type: mrr@100
          value: 0.66254
        - type: mrr@1000
          value: 0.66292
        - type: main_score
          value: 0.68412
      task:
        type: Retrieval
---

For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [Github](https://github.com/neilwen987/CSR_Adaptive_Rep).


## Usage
📌 **Tip**: For NV-Embed-V2, using Transformers versions **later** than 4.47.0 may lead to performance degradation, as ``model_type=bidir_mistral`` in ``config.json``  is no longer supported.

We recommend using ``Transformers 4.47.0.``

### Sentence Transformers Usage
You can evaluate this model loaded by Sentence Transformers with the following code snippet:
```python
import mteb
from sentence_transformers import SparseEncoder
model = SparseEncoder(
    "Y-Research-Group/CSR-NV_Embed_v2-Retrieval-SciFACT ",
    trust_remote_code=True
)
model.prompts = {
    "SciFact-query": "Instrcut: Given a scientific claim, retrieve documents that support or refute the claim\nQuery:"
}
task = mteb.get_tasks(tasks=["SciFact"])
evaluation = mteb.MTEB(tasks=task)
evaluation.run(
    model,
    eval_splits=["test"],
    output_folder="./results/SciFact",
    show_progress_bar=True
    encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8},
)  # MTEB don't support sparse tensors yet, so we need to convert to dense tensors
```

## Citation
```bibtex
@inproceedings{wenbeyond,
  title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation},
  author={Wen, Tiansheng and Wang, Yifei and Zeng, Zequn and Peng, Zhong and Su, Yudi and Liu, Xinyang and Chen, Bo and Liu, Hongwei and Jegelka, Stefanie and You, Chenyu},
  booktitle={Forty-second International Conference on Machine Learning}
}
```