--- license: mit datasets: - mteb/scifact language: - en pipeline_tag: text-retrieval library_name: sentence-transformers tags: - mteb - text - transformers - text-embeddings-inference - CSR model-index: - name: CSR results: - dataset: name: MTEB SciFact type: mteb/scifact revision: 0228b52cf27578f30900b9e5271d331663a030d7 config: default split: test languages: - eng-Latn metrics: - type: ndcg@1 value: 0.59333 - type: ndcg@3 value: 0.65703 - type: ndcg@5 value: 0.67072 - type: ndcg@10 value: 0.68412 - type: ndcg@20 value: 0.69238 - type: ndcg@100 value: 0.70514 - type: ndcg@1000 value: 0.71517 - type: map@1 value: 0.5675 - type: map@3 value: 0.63602 - type: map@5 value: 0.64712 - type: map@10 value: 0.65301 - type: map@20 value: 0.65552 - type: map@100 value: 0.65778 - type: map@1000 value: 0.65815 - type: recall@1 value: 0.5675 - type: recall@3 value: 0.69772 - type: recall@5 value: 0.73367 - type: recall@10 value: 0.77333 - type: recall@20 value: 0.80367 - type: recall@100 value: 0.86667 - type: recall@1000 value: 0.945 - type: precision@1 value: 0.59333 - type: precision@3 value: 0.25667 - type: precision@5 value: 0.164 - type: precision@10 value: 0.08667 - type: precision@20 value: 0.04533 - type: precision@100 value: 0.0099 - type: precision@1000 value: 0.00107 - type: mrr@1 value: 0.59333 - type: mrr@3 value: 0.64667 - type: mrr@5 value: 0.65333 - type: mrr@10 value: 0.65883 - type: mrr@20 value: 0.66105 - type: mrr@100 value: 0.66254 - type: mrr@1000 value: 0.66292 - type: main_score value: 0.68412 task: type: Retrieval --- For more details, including benchmark evaluation, hardware requirements, and inference performance, please refer to our [Github](https://github.com/neilwen987/CSR_Adaptive_Rep). ## Usage 📌 **Tip**: For NV-Embed-V2, using Transformers versions **later** than 4.47.0 may lead to performance degradation, as ``model_type=bidir_mistral`` in ``config.json`` is no longer supported. We recommend using ``Transformers 4.47.0.`` ### Sentence Transformers Usage You can evaluate this model loaded by Sentence Transformers with the following code snippet: ```python import mteb from sentence_transformers import SparseEncoder model = SparseEncoder( "Y-Research-Group/CSR-NV_Embed_v2-Retrieval-SciFACT ", trust_remote_code=True ) model.prompts = { "SciFact-query": "Instrcut: Given a scientific claim, retrieve documents that support or refute the claim\nQuery:" } task = mteb.get_tasks(tasks=["SciFact"]) evaluation = mteb.MTEB(tasks=task) evaluation.run( model, eval_splits=["test"], output_folder="./results/SciFact", show_progress_bar=True encode_kwargs={"convert_to_sparse_tensor": False, "batch_size": 8}, ) # MTEB don't support sparse tensors yet, so we need to convert to dense tensors ``` ## Citation ```bibtex @inproceedings{wenbeyond, title={Beyond Matryoshka: Revisiting Sparse Coding for Adaptive Representation}, author={Wen, Tiansheng and Wang, Yifei and Zeng, Zequn and Peng, Zhong and Su, Yudi and Liu, Xinyang and Chen, Bo and Liu, Hongwei and Jegelka, Stefanie and You, Chenyu}, booktitle={Forty-second International Conference on Machine Learning} } ```