xin0920 commited on
Commit
d9b770f
·
1 Parent(s): 665f6ff

First commit

Browse files
1_Pooling/config.json ADDED
@@ -0,0 +1,10 @@
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "word_embedding_dimension": 4096,
3
+ "pooling_mode_cls_token": false,
4
+ "pooling_mode_mean_tokens": true,
5
+ "pooling_mode_max_tokens": false,
6
+ "pooling_mode_mean_sqrt_len_tokens": false,
7
+ "pooling_mode_weightedmean_tokens": false,
8
+ "pooling_mode_lasttoken": false,
9
+ "include_prompt": false
10
+ }
3_CSRSparsity/config.json ADDED
@@ -0,0 +1,8 @@
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "input_dim": 4096,
3
+ "hidden_dim": 16384,
4
+ "k": 32,
5
+ "k_aux": 512,
6
+ "normalize": false,
7
+ "dead_threshold": 30
8
+ }
3_CSRSparsity/pytorch_model.bin ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:5ef94bb69abc22145d0f36dccab1a7aac999a5b14227a82df03c37e019972784
3
+ size 268650816
README.md ADDED
@@ -0,0 +1,2188 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ tags:
3
+ - mteb
4
+ - sentence-transformers
5
+ model-index:
6
+ - name: NV-Embed-v2
7
+ results:
8
+ - dataset:
9
+ config: en
10
+ name: MTEB AmazonCounterfactualClassification (en)
11
+ revision: e8379541af4e31359cca9fbcf4b00f2671dba205
12
+ split: test
13
+ type: mteb/amazon_counterfactual
14
+ metrics:
15
+ - type: accuracy
16
+ value: 94.28358208955224
17
+ - type: accuracy_stderr
18
+ value: 0.40076780842082305
19
+ - type: ap
20
+ value: 76.49097318319616
21
+ - type: ap_stderr
22
+ value: 1.2418692675183929
23
+ - type: f1
24
+ value: 91.41982003001168
25
+ - type: f1_stderr
26
+ value: 0.5043921413093579
27
+ - type: main_score
28
+ value: 94.28358208955224
29
+ task:
30
+ type: Classification
31
+ - dataset:
32
+ config: default
33
+ name: MTEB AmazonPolarityClassification
34
+ revision: e2d317d38cd51312af73b3d32a06d1a08b442046
35
+ split: test
36
+ type: mteb/amazon_polarity
37
+ metrics:
38
+ - type: accuracy
39
+ value: 97.74185000000001
40
+ - type: accuracy_stderr
41
+ value: 0.07420471683120942
42
+ - type: ap
43
+ value: 96.4737144875525
44
+ - type: ap_stderr
45
+ value: 0.2977518241541558
46
+ - type: f1
47
+ value: 97.7417581594921
48
+ - type: f1_stderr
49
+ value: 0.07428763617010377
50
+ - type: main_score
51
+ value: 97.74185000000001
52
+ task:
53
+ type: Classification
54
+ - dataset:
55
+ config: en
56
+ name: MTEB AmazonReviewsClassification (en)
57
+ revision: 1399c76144fd37290681b995c656ef9b2e06e26d
58
+ split: test
59
+ type: mteb/amazon_reviews_multi
60
+ metrics:
61
+ - type: accuracy
62
+ value: 63.96000000000001
63
+ - type: accuracy_stderr
64
+ value: 1.815555011559825
65
+ - type: f1
66
+ value: 62.49361841640459
67
+ - type: f1_stderr
68
+ value: 2.829339314126457
69
+ - type: main_score
70
+ value: 63.96000000000001
71
+ task:
72
+ type: Classification
73
+ - dataset:
74
+ config: default
75
+ name: MTEB ArguAna
76
+ revision: c22ab2a51041ffd869aaddef7af8d8215647e41a
77
+ split: test
78
+ type: mteb/arguana
79
+ metrics:
80
+ - type: map_at_1
81
+ value: 46.515
82
+ - type: map_at_10
83
+ value: 62.392
84
+ - type: map_at_100
85
+ value: 62.732
86
+ - type: map_at_1000
87
+ value: 62.733000000000004
88
+ - type: map_at_3
89
+ value: 58.701
90
+ - type: map_at_5
91
+ value: 61.027
92
+ - type: mrr_at_1
93
+ value: 0.0
94
+ - type: mrr_at_10
95
+ value: 0.0
96
+ - type: mrr_at_100
97
+ value: 0.0
98
+ - type: mrr_at_1000
99
+ value: 0.0
100
+ - type: mrr_at_3
101
+ value: 0.0
102
+ - type: mrr_at_5
103
+ value: 0.0
104
+ - type: ndcg_at_1
105
+ value: 46.515
106
+ - type: ndcg_at_10
107
+ value: 70.074
108
+ - type: ndcg_at_100
109
+ value: 71.395
110
+ - type: ndcg_at_1000
111
+ value: 71.405
112
+ - type: ndcg_at_3
113
+ value: 62.643
114
+ - type: ndcg_at_5
115
+ value: 66.803
116
+ - type: precision_at_1
117
+ value: 46.515
118
+ - type: precision_at_10
119
+ value: 9.41
120
+ - type: precision_at_100
121
+ value: 0.996
122
+ - type: precision_at_1000
123
+ value: 0.1
124
+ - type: precision_at_3
125
+ value: 24.68
126
+ - type: precision_at_5
127
+ value: 16.814
128
+ - type: recall_at_1
129
+ value: 46.515
130
+ - type: recall_at_10
131
+ value: 94.097
132
+ - type: recall_at_100
133
+ value: 99.57300000000001
134
+ - type: recall_at_1000
135
+ value: 99.644
136
+ - type: recall_at_3
137
+ value: 74.03999999999999
138
+ - type: recall_at_5
139
+ value: 84.068
140
+ - type: main_score
141
+ value: 70.074
142
+ task:
143
+ type: Retrieval
144
+ - dataset:
145
+ config: default
146
+ name: MTEB ArxivClusteringP2P
147
+ revision: a122ad7f3f0291bf49cc6f4d32aa80929df69d5d
148
+ split: test
149
+ type: mteb/arxiv-clustering-p2p
150
+ metrics:
151
+ - type: main_score
152
+ value: 55.79933795955242
153
+ - type: v_measure
154
+ value: 55.79933795955242
155
+ - type: v_measure_std
156
+ value: 14.575108141916148
157
+ task:
158
+ type: Clustering
159
+ - dataset:
160
+ config: default
161
+ name: MTEB ArxivClusteringS2S
162
+ revision: f910caf1a6075f7329cdf8c1a6135696f37dbd53
163
+ split: test
164
+ type: mteb/arxiv-clustering-s2s
165
+ metrics:
166
+ - type: main_score
167
+ value: 51.262845995850334
168
+ - type: v_measure
169
+ value: 51.262845995850334
170
+ - type: v_measure_std
171
+ value: 14.727824473104173
172
+ task:
173
+ type: Clustering
174
+ - dataset:
175
+ config: default
176
+ name: MTEB AskUbuntuDupQuestions
177
+ revision: 2000358ca161889fa9c082cb41daa8dcfb161a54
178
+ split: test
179
+ type: mteb/askubuntudupquestions-reranking
180
+ metrics:
181
+ - type: map
182
+ value: 67.46477327480808
183
+ - type: mrr
184
+ value: 79.50160488941653
185
+ - type: main_score
186
+ value: 67.46477327480808
187
+ task:
188
+ type: Reranking
189
+ - dataset:
190
+ config: default
191
+ name: MTEB BIOSSES
192
+ revision: d3fb88f8f02e40887cd149695127462bbcf29b4a
193
+ split: test
194
+ type: mteb/biosses-sts
195
+ metrics:
196
+ - type: cosine_pearson
197
+ value: 89.74311007980987
198
+ - type: cosine_spearman
199
+ value: 87.41644967443246
200
+ - type: manhattan_pearson
201
+ value: 88.57457108347744
202
+ - type: manhattan_spearman
203
+ value: 87.59295972042997
204
+ - type: euclidean_pearson
205
+ value: 88.27108977118459
206
+ - type: euclidean_spearman
207
+ value: 87.41644967443246
208
+ - type: main_score
209
+ value: 87.41644967443246
210
+ task:
211
+ type: STS
212
+ - dataset:
213
+ config: default
214
+ name: MTEB Banking77Classification
215
+ revision: 0fd18e25b25c072e09e0d92ab615fda904d66300
216
+ split: test
217
+ type: mteb/banking77
218
+ metrics:
219
+ - type: accuracy
220
+ value: 92.41558441558443
221
+ - type: accuracy_stderr
222
+ value: 0.37701502251934443
223
+ - type: f1
224
+ value: 92.38130170447671
225
+ - type: f1_stderr
226
+ value: 0.39115151225617767
227
+ - type: main_score
228
+ value: 92.41558441558443
229
+ task:
230
+ type: Classification
231
+ - dataset:
232
+ config: default
233
+ name: MTEB BiorxivClusteringP2P
234
+ revision: 65b79d1d13f80053f67aca9498d9402c2d9f1f40
235
+ split: test
236
+ type: mteb/biorxiv-clustering-p2p
237
+ metrics:
238
+ - type: main_score
239
+ value: 54.08649516394218
240
+ - type: v_measure
241
+ value: 54.08649516394218
242
+ - type: v_measure_std
243
+ value: 0.5303233693045373
244
+ task:
245
+ type: Clustering
246
+ - dataset:
247
+ config: default
248
+ name: MTEB BiorxivClusteringS2S
249
+ revision: 258694dd0231531bc1fd9de6ceb52a0853c6d908
250
+ split: test
251
+ type: mteb/biorxiv-clustering-s2s
252
+ metrics:
253
+ - type: main_score
254
+ value: 49.60352214167779
255
+ - type: v_measure
256
+ value: 49.60352214167779
257
+ - type: v_measure_std
258
+ value: 0.7176198612516721
259
+ task:
260
+ type: Clustering
261
+ - dataset:
262
+ config: default
263
+ name: MTEB CQADupstackRetrieval
264
+ revision: 46989137a86843e03a6195de44b09deda022eec7
265
+ split: test
266
+ type: CQADupstackRetrieval_is_a_combined_dataset
267
+ metrics:
268
+ - type: map_at_1
269
+ value: 31.913249999999998
270
+ - type: map_at_10
271
+ value: 43.87733333333334
272
+ - type: map_at_100
273
+ value: 45.249916666666664
274
+ - type: map_at_1000
275
+ value: 45.350583333333326
276
+ - type: map_at_3
277
+ value: 40.316833333333335
278
+ - type: map_at_5
279
+ value: 42.317083333333336
280
+ - type: mrr_at_1
281
+ value: 0.0
282
+ - type: mrr_at_10
283
+ value: 0.0
284
+ - type: mrr_at_100
285
+ value: 0.0
286
+ - type: mrr_at_1000
287
+ value: 0.0
288
+ - type: mrr_at_3
289
+ value: 0.0
290
+ - type: mrr_at_5
291
+ value: 0.0
292
+ - type: ndcg_at_1
293
+ value: 38.30616666666667
294
+ - type: ndcg_at_10
295
+ value: 50.24175000000001
296
+ - type: ndcg_at_100
297
+ value: 55.345333333333336
298
+ - type: ndcg_at_1000
299
+ value: 56.91225000000001
300
+ - type: ndcg_at_3
301
+ value: 44.67558333333333
302
+ - type: ndcg_at_5
303
+ value: 47.32333333333334
304
+ - type: precision_at_1
305
+ value: 38.30616666666667
306
+ - type: precision_at_10
307
+ value: 9.007416666666666
308
+ - type: precision_at_100
309
+ value: 1.3633333333333333
310
+ - type: precision_at_1000
311
+ value: 0.16691666666666666
312
+ - type: precision_at_3
313
+ value: 20.895666666666667
314
+ - type: precision_at_5
315
+ value: 14.871666666666666
316
+ - type: recall_at_1
317
+ value: 31.913249999999998
318
+ - type: recall_at_10
319
+ value: 64.11891666666666
320
+ - type: recall_at_100
321
+ value: 85.91133333333333
322
+ - type: recall_at_1000
323
+ value: 96.28225
324
+ - type: recall_at_3
325
+ value: 48.54749999999999
326
+ - type: recall_at_5
327
+ value: 55.44283333333334
328
+ - type: main_score
329
+ value: 50.24175000000001
330
+ task:
331
+ type: Retrieval
332
+ - dataset:
333
+ config: default
334
+ name: MTEB ClimateFEVER
335
+ revision: 47f2ac6acb640fc46020b02a5b59fdda04d39380
336
+ split: test
337
+ type: mteb/climate-fever
338
+ metrics:
339
+ - type: map_at_1
340
+ value: 19.556
341
+ - type: map_at_10
342
+ value: 34.623
343
+ - type: map_at_100
344
+ value: 36.97
345
+ - type: map_at_1000
346
+ value: 37.123
347
+ - type: map_at_3
348
+ value: 28.904999999999998
349
+ - type: map_at_5
350
+ value: 31.955
351
+ - type: mrr_at_1
352
+ value: 0.0
353
+ - type: mrr_at_10
354
+ value: 0.0
355
+ - type: mrr_at_100
356
+ value: 0.0
357
+ - type: mrr_at_1000
358
+ value: 0.0
359
+ - type: mrr_at_3
360
+ value: 0.0
361
+ - type: mrr_at_5
362
+ value: 0.0
363
+ - type: ndcg_at_1
364
+ value: 44.104
365
+ - type: ndcg_at_10
366
+ value: 45.388
367
+ - type: ndcg_at_100
368
+ value: 52.793
369
+ - type: ndcg_at_1000
370
+ value: 55.108999999999995
371
+ - type: ndcg_at_3
372
+ value: 38.604
373
+ - type: ndcg_at_5
374
+ value: 40.806
375
+ - type: precision_at_1
376
+ value: 44.104
377
+ - type: precision_at_10
378
+ value: 14.143
379
+ - type: precision_at_100
380
+ value: 2.2190000000000003
381
+ - type: precision_at_1000
382
+ value: 0.266
383
+ - type: precision_at_3
384
+ value: 29.316
385
+ - type: precision_at_5
386
+ value: 21.98
387
+ - type: recall_at_1
388
+ value: 19.556
389
+ - type: recall_at_10
390
+ value: 52.120999999999995
391
+ - type: recall_at_100
392
+ value: 76.509
393
+ - type: recall_at_1000
394
+ value: 89.029
395
+ - type: recall_at_3
396
+ value: 34.919
397
+ - type: recall_at_5
398
+ value: 42.18
399
+ - type: main_score
400
+ value: 45.388
401
+ task:
402
+ type: Retrieval
403
+ - dataset:
404
+ config: default
405
+ name: MTEB DBPedia
406
+ revision: c0f706b76e590d620bd6618b3ca8efdd34e2d659
407
+ split: test
408
+ type: mteb/dbpedia
409
+ metrics:
410
+ - type: map_at_1
411
+ value: 10.714
412
+ - type: map_at_10
413
+ value: 25.814999999999998
414
+ - type: map_at_100
415
+ value: 37.845
416
+ - type: map_at_1000
417
+ value: 39.974
418
+ - type: map_at_3
419
+ value: 17.201
420
+ - type: map_at_5
421
+ value: 21.062
422
+ - type: mrr_at_1
423
+ value: 0.0
424
+ - type: mrr_at_10
425
+ value: 0.0
426
+ - type: mrr_at_100
427
+ value: 0.0
428
+ - type: mrr_at_1000
429
+ value: 0.0
430
+ - type: mrr_at_3
431
+ value: 0.0
432
+ - type: mrr_at_5
433
+ value: 0.0
434
+ - type: ndcg_at_1
435
+ value: 66.0
436
+ - type: ndcg_at_10
437
+ value: 53.496
438
+ - type: ndcg_at_100
439
+ value: 58.053
440
+ - type: ndcg_at_1000
441
+ value: 64.886
442
+ - type: ndcg_at_3
443
+ value: 57.656
444
+ - type: ndcg_at_5
445
+ value: 55.900000000000006
446
+ - type: precision_at_1
447
+ value: 77.25
448
+ - type: precision_at_10
449
+ value: 43.65
450
+ - type: precision_at_100
451
+ value: 13.76
452
+ - type: precision_at_1000
453
+ value: 2.5940000000000003
454
+ - type: precision_at_3
455
+ value: 61.0
456
+ - type: precision_at_5
457
+ value: 54.65
458
+ - type: recall_at_1
459
+ value: 10.714
460
+ - type: recall_at_10
461
+ value: 31.173000000000002
462
+ - type: recall_at_100
463
+ value: 63.404
464
+ - type: recall_at_1000
465
+ value: 85.874
466
+ - type: recall_at_3
467
+ value: 18.249000000000002
468
+ - type: recall_at_5
469
+ value: 23.69
470
+ - type: main_score
471
+ value: 53.496
472
+ task:
473
+ type: Retrieval
474
+ - dataset:
475
+ config: default
476
+ name: MTEB EmotionClassification
477
+ revision: 4f58c6b202a23cf9a4da393831edf4f9183cad37
478
+ split: test
479
+ type: mteb/emotion
480
+ metrics:
481
+ - type: accuracy
482
+ value: 93.38499999999999
483
+ - type: accuracy_stderr
484
+ value: 0.13793114224133846
485
+ - type: f1
486
+ value: 90.12141028353496
487
+ - type: f1_stderr
488
+ value: 0.174640257706043
489
+ - type: main_score
490
+ value: 93.38499999999999
491
+ task:
492
+ type: Classification
493
+ - dataset:
494
+ config: default
495
+ name: MTEB FEVER
496
+ revision: bea83ef9e8fb933d90a2f1d5515737465d613e12
497
+ split: test
498
+ type: mteb/fever
499
+ metrics:
500
+ - type: map_at_1
501
+ value: 84.66900000000001
502
+ - type: map_at_10
503
+ value: 91.52799999999999
504
+ - type: map_at_100
505
+ value: 91.721
506
+ - type: map_at_1000
507
+ value: 91.73
508
+ - type: map_at_3
509
+ value: 90.752
510
+ - type: map_at_5
511
+ value: 91.262
512
+ - type: mrr_at_1
513
+ value: 0.0
514
+ - type: mrr_at_10
515
+ value: 0.0
516
+ - type: mrr_at_100
517
+ value: 0.0
518
+ - type: mrr_at_1000
519
+ value: 0.0
520
+ - type: mrr_at_3
521
+ value: 0.0
522
+ - type: mrr_at_5
523
+ value: 0.0
524
+ - type: ndcg_at_1
525
+ value: 91.20899999999999
526
+ - type: ndcg_at_10
527
+ value: 93.74900000000001
528
+ - type: ndcg_at_100
529
+ value: 94.279
530
+ - type: ndcg_at_1000
531
+ value: 94.408
532
+ - type: ndcg_at_3
533
+ value: 92.923
534
+ - type: ndcg_at_5
535
+ value: 93.376
536
+ - type: precision_at_1
537
+ value: 91.20899999999999
538
+ - type: precision_at_10
539
+ value: 11.059
540
+ - type: precision_at_100
541
+ value: 1.1560000000000001
542
+ - type: precision_at_1000
543
+ value: 0.11800000000000001
544
+ - type: precision_at_3
545
+ value: 35.129
546
+ - type: precision_at_5
547
+ value: 21.617
548
+ - type: recall_at_1
549
+ value: 84.66900000000001
550
+ - type: recall_at_10
551
+ value: 97.03399999999999
552
+ - type: recall_at_100
553
+ value: 98.931
554
+ - type: recall_at_1000
555
+ value: 99.65899999999999
556
+ - type: recall_at_3
557
+ value: 94.76299999999999
558
+ - type: recall_at_5
559
+ value: 95.968
560
+ - type: main_score
561
+ value: 93.74900000000001
562
+ task:
563
+ type: Retrieval
564
+ - dataset:
565
+ config: default
566
+ name: MTEB FiQA2018
567
+ revision: 27a168819829fe9bcd655c2df245fb19452e8e06
568
+ split: test
569
+ type: mteb/fiqa
570
+ metrics:
571
+ - type: map_at_1
572
+ value: 34.866
573
+ - type: map_at_10
574
+ value: 58.06099999999999
575
+ - type: map_at_100
576
+ value: 60.028999999999996
577
+ - type: map_at_1000
578
+ value: 60.119
579
+ - type: map_at_3
580
+ value: 51.304
581
+ - type: map_at_5
582
+ value: 55.054
583
+ - type: mrr_at_1
584
+ value: 0.0
585
+ - type: mrr_at_10
586
+ value: 0.0
587
+ - type: mrr_at_100
588
+ value: 0.0
589
+ - type: mrr_at_1000
590
+ value: 0.0
591
+ - type: mrr_at_3
592
+ value: 0.0
593
+ - type: mrr_at_5
594
+ value: 0.0
595
+ - type: ndcg_at_1
596
+ value: 64.815
597
+ - type: ndcg_at_10
598
+ value: 65.729
599
+ - type: ndcg_at_100
600
+ value: 71.14
601
+ - type: ndcg_at_1000
602
+ value: 72.336
603
+ - type: ndcg_at_3
604
+ value: 61.973
605
+ - type: ndcg_at_5
606
+ value: 62.858000000000004
607
+ - type: precision_at_1
608
+ value: 64.815
609
+ - type: precision_at_10
610
+ value: 17.87
611
+ - type: precision_at_100
612
+ value: 2.373
613
+ - type: precision_at_1000
614
+ value: 0.258
615
+ - type: precision_at_3
616
+ value: 41.152
617
+ - type: precision_at_5
618
+ value: 29.568
619
+ - type: recall_at_1
620
+ value: 34.866
621
+ - type: recall_at_10
622
+ value: 72.239
623
+ - type: recall_at_100
624
+ value: 91.19
625
+ - type: recall_at_1000
626
+ value: 98.154
627
+ - type: recall_at_3
628
+ value: 56.472
629
+ - type: recall_at_5
630
+ value: 63.157
631
+ - type: main_score
632
+ value: 65.729
633
+ task:
634
+ type: Retrieval
635
+ - dataset:
636
+ config: default
637
+ name: MTEB HotpotQA
638
+ revision: ab518f4d6fcca38d87c25209f94beba119d02014
639
+ split: test
640
+ type: mteb/hotpotqa
641
+ metrics:
642
+ - type: map_at_1
643
+ value: 44.651999999999994
644
+ - type: map_at_10
645
+ value: 79.95100000000001
646
+ - type: map_at_100
647
+ value: 80.51700000000001
648
+ - type: map_at_1000
649
+ value: 80.542
650
+ - type: map_at_3
651
+ value: 77.008
652
+ - type: map_at_5
653
+ value: 78.935
654
+ - type: mrr_at_1
655
+ value: 0.0
656
+ - type: mrr_at_10
657
+ value: 0.0
658
+ - type: mrr_at_100
659
+ value: 0.0
660
+ - type: mrr_at_1000
661
+ value: 0.0
662
+ - type: mrr_at_3
663
+ value: 0.0
664
+ - type: mrr_at_5
665
+ value: 0.0
666
+ - type: ndcg_at_1
667
+ value: 89.305
668
+ - type: ndcg_at_10
669
+ value: 85.479
670
+ - type: ndcg_at_100
671
+ value: 87.235
672
+ - type: ndcg_at_1000
673
+ value: 87.669
674
+ - type: ndcg_at_3
675
+ value: 81.648
676
+ - type: ndcg_at_5
677
+ value: 83.88600000000001
678
+ - type: precision_at_1
679
+ value: 89.305
680
+ - type: precision_at_10
681
+ value: 17.807000000000002
682
+ - type: precision_at_100
683
+ value: 1.9140000000000001
684
+ - type: precision_at_1000
685
+ value: 0.197
686
+ - type: precision_at_3
687
+ value: 53.756
688
+ - type: precision_at_5
689
+ value: 34.018
690
+ - type: recall_at_1
691
+ value: 44.651999999999994
692
+ - type: recall_at_10
693
+ value: 89.034
694
+ - type: recall_at_100
695
+ value: 95.719
696
+ - type: recall_at_1000
697
+ value: 98.535
698
+ - type: recall_at_3
699
+ value: 80.635
700
+ - type: recall_at_5
701
+ value: 85.044
702
+ - type: main_score
703
+ value: 85.479
704
+ task:
705
+ type: Retrieval
706
+ - dataset:
707
+ config: default
708
+ name: MTEB ImdbClassification
709
+ revision: 3d86128a09e091d6018b6d26cad27f2739fc2db7
710
+ split: test
711
+ type: mteb/imdb
712
+ metrics:
713
+ - type: accuracy
714
+ value: 97.1376
715
+ - type: accuracy_stderr
716
+ value: 0.04571914259913447
717
+ - type: ap
718
+ value: 95.92783808558808
719
+ - type: ap_stderr
720
+ value: 0.05063782483358255
721
+ - type: f1
722
+ value: 97.13755519177172
723
+ - type: f1_stderr
724
+ value: 0.04575943074086138
725
+ - type: main_score
726
+ value: 97.1376
727
+ task:
728
+ type: Classification
729
+ - dataset:
730
+ config: default
731
+ name: MTEB MSMARCO
732
+ revision: c5a29a104738b98a9e76336939199e264163d4a0
733
+ split: dev
734
+ type: mteb/msmarco
735
+ metrics:
736
+ - type: map_at_1
737
+ value: 0.0
738
+ - type: map_at_10
739
+ value: 38.342
740
+ - type: map_at_100
741
+ value: 0.0
742
+ - type: map_at_1000
743
+ value: 0.0
744
+ - type: map_at_3
745
+ value: 0.0
746
+ - type: map_at_5
747
+ value: 0.0
748
+ - type: mrr_at_1
749
+ value: 0.0
750
+ - type: mrr_at_10
751
+ value: 0.0
752
+ - type: mrr_at_100
753
+ value: 0.0
754
+ - type: mrr_at_1000
755
+ value: 0.0
756
+ - type: mrr_at_3
757
+ value: 0.0
758
+ - type: mrr_at_5
759
+ value: 0.0
760
+ - type: ndcg_at_1
761
+ value: 0.0
762
+ - type: ndcg_at_10
763
+ value: 45.629999999999995
764
+ - type: ndcg_at_100
765
+ value: 0.0
766
+ - type: ndcg_at_1000
767
+ value: 0.0
768
+ - type: ndcg_at_3
769
+ value: 0.0
770
+ - type: ndcg_at_5
771
+ value: 0.0
772
+ - type: precision_at_1
773
+ value: 0.0
774
+ - type: precision_at_10
775
+ value: 7.119000000000001
776
+ - type: precision_at_100
777
+ value: 0.0
778
+ - type: precision_at_1000
779
+ value: 0.0
780
+ - type: precision_at_3
781
+ value: 0.0
782
+ - type: precision_at_5
783
+ value: 0.0
784
+ - type: recall_at_1
785
+ value: 0.0
786
+ - type: recall_at_10
787
+ value: 67.972
788
+ - type: recall_at_100
789
+ value: 0.0
790
+ - type: recall_at_1000
791
+ value: 0.0
792
+ - type: recall_at_3
793
+ value: 0.0
794
+ - type: recall_at_5
795
+ value: 0.0
796
+ - type: main_score
797
+ value: 45.629999999999995
798
+ task:
799
+ type: Retrieval
800
+ - dataset:
801
+ config: en
802
+ name: MTEB MTOPDomainClassification (en)
803
+ revision: d80d48c1eb48d3562165c59d59d0034df9fff0bf
804
+ split: test
805
+ type: mteb/mtop_domain
806
+ metrics:
807
+ - type: accuracy
808
+ value: 99.24988600091199
809
+ - type: accuracy_stderr
810
+ value: 0.04496826931900734
811
+ - type: f1
812
+ value: 99.15933275095276
813
+ - type: f1_stderr
814
+ value: 0.05565039139747446
815
+ - type: main_score
816
+ value: 99.24988600091199
817
+ task:
818
+ type: Classification
819
+ - dataset:
820
+ config: en
821
+ name: MTEB MTOPIntentClassification (en)
822
+ revision: ae001d0e6b1228650b7bd1c2c65fb50ad11a8aba
823
+ split: test
824
+ type: mteb/mtop_intent
825
+ metrics:
826
+ - type: accuracy
827
+ value: 94.3684450524396
828
+ - type: accuracy_stderr
829
+ value: 0.8436548701322188
830
+ - type: f1
831
+ value: 77.33022623133307
832
+ - type: f1_stderr
833
+ value: 0.9228425861187275
834
+ - type: main_score
835
+ value: 94.3684450524396
836
+ task:
837
+ type: Classification
838
+ - dataset:
839
+ config: en
840
+ name: MTEB MassiveIntentClassification (en)
841
+ revision: 31efe3c427b0bae9c22cbb560b8f15491cc6bed7
842
+ split: test
843
+ type: mteb/amazon_massive_intent
844
+ metrics:
845
+ - type: accuracy
846
+ value: 86.09616677874916
847
+ - type: accuracy_stderr
848
+ value: 0.9943208055590853
849
+ - type: f1
850
+ value: 83.4902056490062
851
+ - type: f1_stderr
852
+ value: 0.7626189310074184
853
+ - type: main_score
854
+ value: 86.09616677874916
855
+ task:
856
+ type: Classification
857
+ - dataset:
858
+ config: en
859
+ name: MTEB MassiveScenarioClassification (en)
860
+ revision: 7d571f92784cd94a019292a1f45445077d0ef634
861
+ split: test
862
+ type: mteb/amazon_massive_scenario
863
+ metrics:
864
+ - type: accuracy
865
+ value: 92.17215870880968
866
+ - type: accuracy_stderr
867
+ value: 0.25949941333658166
868
+ - type: f1
869
+ value: 91.36757392422702
870
+ - type: f1_stderr
871
+ value: 0.29139507298154815
872
+ - type: main_score
873
+ value: 92.17215870880968
874
+ task:
875
+ type: Classification
876
+ - dataset:
877
+ config: default
878
+ name: MTEB MedrxivClusteringP2P
879
+ revision: e7a26af6f3ae46b30dde8737f02c07b1505bcc73
880
+ split: test
881
+ type: mteb/medrxiv-clustering-p2p
882
+ metrics:
883
+ - type: main_score
884
+ value: 46.09497344077905
885
+ - type: v_measure
886
+ value: 46.09497344077905
887
+ - type: v_measure_std
888
+ value: 1.44871520869784
889
+ task:
890
+ type: Clustering
891
+ - dataset:
892
+ config: default
893
+ name: MTEB MedrxivClusteringS2S
894
+ revision: 35191c8c0dca72d8ff3efcd72aa802307d469663
895
+ split: test
896
+ type: mteb/medrxiv-clustering-s2s
897
+ metrics:
898
+ - type: main_score
899
+ value: 44.861049989560684
900
+ - type: v_measure
901
+ value: 44.861049989560684
902
+ - type: v_measure_std
903
+ value: 1.432199293162203
904
+ task:
905
+ type: Clustering
906
+ - dataset:
907
+ config: default
908
+ name: MTEB MindSmallReranking
909
+ revision: 3bdac13927fdc888b903db93b2ffdbd90b295a69
910
+ split: test
911
+ type: mteb/mind_small
912
+ metrics:
913
+ - type: map
914
+ value: 31.75936162919999
915
+ - type: mrr
916
+ value: 32.966812736541236
917
+ - type: main_score
918
+ value: 31.75936162919999
919
+ task:
920
+ type: Reranking
921
+ - dataset:
922
+ config: default
923
+ name: MTEB NFCorpus
924
+ revision: ec0fa4fe99da2ff19ca1214b7966684033a58814
925
+ split: test
926
+ type: mteb/nfcorpus
927
+ metrics:
928
+ - type: map_at_1
929
+ value: 7.893999999999999
930
+ - type: map_at_10
931
+ value: 17.95
932
+ - type: map_at_100
933
+ value: 23.474
934
+ - type: map_at_1000
935
+ value: 25.412000000000003
936
+ - type: map_at_3
937
+ value: 12.884
938
+ - type: map_at_5
939
+ value: 15.171000000000001
940
+ - type: mrr_at_1
941
+ value: 0.0
942
+ - type: mrr_at_10
943
+ value: 0.0
944
+ - type: mrr_at_100
945
+ value: 0.0
946
+ - type: mrr_at_1000
947
+ value: 0.0
948
+ - type: mrr_at_3
949
+ value: 0.0
950
+ - type: mrr_at_5
951
+ value: 0.0
952
+ - type: ndcg_at_1
953
+ value: 55.728
954
+ - type: ndcg_at_10
955
+ value: 45.174
956
+ - type: ndcg_at_100
957
+ value: 42.18
958
+ - type: ndcg_at_1000
959
+ value: 50.793
960
+ - type: ndcg_at_3
961
+ value: 50.322
962
+ - type: ndcg_at_5
963
+ value: 48.244
964
+ - type: precision_at_1
965
+ value: 57.276
966
+ - type: precision_at_10
967
+ value: 33.437
968
+ - type: precision_at_100
969
+ value: 10.671999999999999
970
+ - type: precision_at_1000
971
+ value: 2.407
972
+ - type: precision_at_3
973
+ value: 46.646
974
+ - type: precision_at_5
975
+ value: 41.672
976
+ - type: recall_at_1
977
+ value: 7.893999999999999
978
+ - type: recall_at_10
979
+ value: 22.831000000000003
980
+ - type: recall_at_100
981
+ value: 43.818
982
+ - type: recall_at_1000
983
+ value: 75.009
984
+ - type: recall_at_3
985
+ value: 14.371
986
+ - type: recall_at_5
987
+ value: 17.752000000000002
988
+ - type: main_score
989
+ value: 45.174
990
+ task:
991
+ type: Retrieval
992
+ - dataset:
993
+ config: default
994
+ name: MTEB NQ
995
+ revision: b774495ed302d8c44a3a7ea25c90dbce03968f31
996
+ split: test
997
+ type: mteb/nq
998
+ metrics:
999
+ - type: map_at_1
1000
+ value: 49.351
1001
+ - type: map_at_10
1002
+ value: 66.682
1003
+ - type: map_at_100
1004
+ value: 67.179
1005
+ - type: map_at_1000
1006
+ value: 67.18499999999999
1007
+ - type: map_at_3
1008
+ value: 62.958999999999996
1009
+ - type: map_at_5
1010
+ value: 65.364
1011
+ - type: mrr_at_1
1012
+ value: 0.0
1013
+ - type: mrr_at_10
1014
+ value: 0.0
1015
+ - type: mrr_at_100
1016
+ value: 0.0
1017
+ - type: mrr_at_1000
1018
+ value: 0.0
1019
+ - type: mrr_at_3
1020
+ value: 0.0
1021
+ - type: mrr_at_5
1022
+ value: 0.0
1023
+ - type: ndcg_at_1
1024
+ value: 55.417
1025
+ - type: ndcg_at_10
1026
+ value: 73.568
1027
+ - type: ndcg_at_100
1028
+ value: 75.35
1029
+ - type: ndcg_at_1000
1030
+ value: 75.478
1031
+ - type: ndcg_at_3
1032
+ value: 67.201
1033
+ - type: ndcg_at_5
1034
+ value: 70.896
1035
+ - type: precision_at_1
1036
+ value: 55.417
1037
+ - type: precision_at_10
1038
+ value: 11.036999999999999
1039
+ - type: precision_at_100
1040
+ value: 1.204
1041
+ - type: precision_at_1000
1042
+ value: 0.121
1043
+ - type: precision_at_3
1044
+ value: 29.654000000000003
1045
+ - type: precision_at_5
1046
+ value: 20.006
1047
+ - type: recall_at_1
1048
+ value: 49.351
1049
+ - type: recall_at_10
1050
+ value: 91.667
1051
+ - type: recall_at_100
1052
+ value: 98.89
1053
+ - type: recall_at_1000
1054
+ value: 99.812
1055
+ - type: recall_at_3
1056
+ value: 75.715
1057
+ - type: recall_at_5
1058
+ value: 84.072
1059
+ - type: main_score
1060
+ value: 73.568
1061
+ task:
1062
+ type: Retrieval
1063
+ - dataset:
1064
+ config: default
1065
+ name: MTEB QuoraRetrieval
1066
+ revision: e4e08e0b7dbe3c8700f0daef558ff32256715259
1067
+ split: test
1068
+ type: mteb/quora
1069
+ metrics:
1070
+ - type: map_at_1
1071
+ value: 71.358
1072
+ - type: map_at_10
1073
+ value: 85.474
1074
+ - type: map_at_100
1075
+ value: 86.101
1076
+ - type: map_at_1000
1077
+ value: 86.114
1078
+ - type: map_at_3
1079
+ value: 82.562
1080
+ - type: map_at_5
1081
+ value: 84.396
1082
+ - type: mrr_at_1
1083
+ value: 0.0
1084
+ - type: mrr_at_10
1085
+ value: 0.0
1086
+ - type: mrr_at_100
1087
+ value: 0.0
1088
+ - type: mrr_at_1000
1089
+ value: 0.0
1090
+ - type: mrr_at_3
1091
+ value: 0.0
1092
+ - type: mrr_at_5
1093
+ value: 0.0
1094
+ - type: ndcg_at_1
1095
+ value: 82.12
1096
+ - type: ndcg_at_10
1097
+ value: 89.035
1098
+ - type: ndcg_at_100
1099
+ value: 90.17399999999999
1100
+ - type: ndcg_at_1000
1101
+ value: 90.243
1102
+ - type: ndcg_at_3
1103
+ value: 86.32300000000001
1104
+ - type: ndcg_at_5
1105
+ value: 87.85
1106
+ - type: precision_at_1
1107
+ value: 82.12
1108
+ - type: precision_at_10
1109
+ value: 13.55
1110
+ - type: precision_at_100
1111
+ value: 1.54
1112
+ - type: precision_at_1000
1113
+ value: 0.157
1114
+ - type: precision_at_3
1115
+ value: 37.89
1116
+ - type: precision_at_5
1117
+ value: 24.9
1118
+ - type: recall_at_1
1119
+ value: 71.358
1120
+ - type: recall_at_10
1121
+ value: 95.855
1122
+ - type: recall_at_100
1123
+ value: 99.711
1124
+ - type: recall_at_1000
1125
+ value: 99.994
1126
+ - type: recall_at_3
1127
+ value: 88.02
1128
+ - type: recall_at_5
1129
+ value: 92.378
1130
+ - type: main_score
1131
+ value: 89.035
1132
+ task:
1133
+ type: Retrieval
1134
+ - dataset:
1135
+ config: default
1136
+ name: MTEB RedditClustering
1137
+ revision: 24640382cdbf8abc73003fb0fa6d111a705499eb
1138
+ split: test
1139
+ type: mteb/reddit-clustering
1140
+ metrics:
1141
+ - type: main_score
1142
+ value: 71.0984522742521
1143
+ - type: v_measure
1144
+ value: 71.0984522742521
1145
+ - type: v_measure_std
1146
+ value: 3.5668139917058044
1147
+ task:
1148
+ type: Clustering
1149
+ - dataset:
1150
+ config: default
1151
+ name: MTEB RedditClusteringP2P
1152
+ revision: 385e3cb46b4cfa89021f56c4380204149d0efe33
1153
+ split: test
1154
+ type: mteb/reddit-clustering-p2p
1155
+ metrics:
1156
+ - type: main_score
1157
+ value: 74.94499641904133
1158
+ - type: v_measure
1159
+ value: 74.94499641904133
1160
+ - type: v_measure_std
1161
+ value: 11.419672879389248
1162
+ task:
1163
+ type: Clustering
1164
+ - dataset:
1165
+ config: default
1166
+ name: MTEB SCIDOCS
1167
+ revision: f8c2fcf00f625baaa80f62ec5bd9e1fff3b8ae88
1168
+ split: test
1169
+ type: mteb/scidocs
1170
+ metrics:
1171
+ - type: map_at_1
1172
+ value: 5.343
1173
+ - type: map_at_10
1174
+ value: 13.044
1175
+ - type: map_at_100
1176
+ value: 15.290999999999999
1177
+ - type: map_at_1000
1178
+ value: 15.609
1179
+ - type: map_at_3
1180
+ value: 9.227
1181
+ - type: map_at_5
1182
+ value: 11.158
1183
+ - type: mrr_at_1
1184
+ value: 0.0
1185
+ - type: mrr_at_10
1186
+ value: 0.0
1187
+ - type: mrr_at_100
1188
+ value: 0.0
1189
+ - type: mrr_at_1000
1190
+ value: 0.0
1191
+ - type: mrr_at_3
1192
+ value: 0.0
1193
+ - type: mrr_at_5
1194
+ value: 0.0
1195
+ - type: ndcg_at_1
1196
+ value: 26.3
1197
+ - type: ndcg_at_10
1198
+ value: 21.901
1199
+ - type: ndcg_at_100
1200
+ value: 30.316
1201
+ - type: ndcg_at_1000
1202
+ value: 35.547000000000004
1203
+ - type: ndcg_at_3
1204
+ value: 20.560000000000002
1205
+ - type: ndcg_at_5
1206
+ value: 18.187
1207
+ - type: precision_at_1
1208
+ value: 26.3
1209
+ - type: precision_at_10
1210
+ value: 11.34
1211
+ - type: precision_at_100
1212
+ value: 2.344
1213
+ - type: precision_at_1000
1214
+ value: 0.359
1215
+ - type: precision_at_3
1216
+ value: 18.967
1217
+ - type: precision_at_5
1218
+ value: 15.920000000000002
1219
+ - type: recall_at_1
1220
+ value: 5.343
1221
+ - type: recall_at_10
1222
+ value: 22.997
1223
+ - type: recall_at_100
1224
+ value: 47.562
1225
+ - type: recall_at_1000
1226
+ value: 72.94500000000001
1227
+ - type: recall_at_3
1228
+ value: 11.533
1229
+ - type: recall_at_5
1230
+ value: 16.148
1231
+ - type: main_score
1232
+ value: 21.901
1233
+ task:
1234
+ type: Retrieval
1235
+ - dataset:
1236
+ config: default
1237
+ name: MTEB SICK-R
1238
+ revision: 20a6d6f312dd54037fe07a32d58e5e168867909d
1239
+ split: test
1240
+ type: mteb/sickr-sts
1241
+ metrics:
1242
+ - type: cosine_pearson
1243
+ value: 87.3054603493591
1244
+ - type: cosine_spearman
1245
+ value: 82.14763206055602
1246
+ - type: manhattan_pearson
1247
+ value: 84.78737790237557
1248
+ - type: manhattan_spearman
1249
+ value: 81.88455356002758
1250
+ - type: euclidean_pearson
1251
+ value: 85.00668629311117
1252
+ - type: euclidean_spearman
1253
+ value: 82.14763037860851
1254
+ - type: main_score
1255
+ value: 82.14763206055602
1256
+ task:
1257
+ type: STS
1258
+ - dataset:
1259
+ config: default
1260
+ name: MTEB STS12
1261
+ revision: a0d554a64d88156834ff5ae9920b964011b16384
1262
+ split: test
1263
+ type: mteb/sts12-sts
1264
+ metrics:
1265
+ - type: cosine_pearson
1266
+ value: 86.6911864687294
1267
+ - type: cosine_spearman
1268
+ value: 77.89286260403269
1269
+ - type: manhattan_pearson
1270
+ value: 82.87240347680857
1271
+ - type: manhattan_spearman
1272
+ value: 78.10055393740326
1273
+ - type: euclidean_pearson
1274
+ value: 82.72282535777123
1275
+ - type: euclidean_spearman
1276
+ value: 77.89256648406325
1277
+ - type: main_score
1278
+ value: 77.89286260403269
1279
+ task:
1280
+ type: STS
1281
+ - dataset:
1282
+ config: default
1283
+ name: MTEB STS13
1284
+ revision: 7e90230a92c190f1bf69ae9002b8cea547a64cca
1285
+ split: test
1286
+ type: mteb/sts13-sts
1287
+ metrics:
1288
+ - type: cosine_pearson
1289
+ value: 87.7220832598633
1290
+ - type: cosine_spearman
1291
+ value: 88.30238972017452
1292
+ - type: manhattan_pearson
1293
+ value: 87.88214789140248
1294
+ - type: manhattan_spearman
1295
+ value: 88.24770220032391
1296
+ - type: euclidean_pearson
1297
+ value: 87.98610386257103
1298
+ - type: euclidean_spearman
1299
+ value: 88.30238972017452
1300
+ - type: main_score
1301
+ value: 88.30238972017452
1302
+ task:
1303
+ type: STS
1304
+ - dataset:
1305
+ config: default
1306
+ name: MTEB STS14
1307
+ revision: 6031580fec1f6af667f0bd2da0a551cf4f0b2375
1308
+ split: test
1309
+ type: mteb/sts14-sts
1310
+ metrics:
1311
+ - type: cosine_pearson
1312
+ value: 85.70614623247714
1313
+ - type: cosine_spearman
1314
+ value: 84.29920990970672
1315
+ - type: manhattan_pearson
1316
+ value: 84.9836190531721
1317
+ - type: manhattan_spearman
1318
+ value: 84.40933470597638
1319
+ - type: euclidean_pearson
1320
+ value: 84.96652336693347
1321
+ - type: euclidean_spearman
1322
+ value: 84.29920989531965
1323
+ - type: main_score
1324
+ value: 84.29920990970672
1325
+ task:
1326
+ type: STS
1327
+ - dataset:
1328
+ config: default
1329
+ name: MTEB STS15
1330
+ revision: ae752c7c21bf194d8b67fd573edf7ae58183cbe3
1331
+ split: test
1332
+ type: mteb/sts15-sts
1333
+ metrics:
1334
+ - type: cosine_pearson
1335
+ value: 88.4169972425264
1336
+ - type: cosine_spearman
1337
+ value: 89.03555007807218
1338
+ - type: manhattan_pearson
1339
+ value: 88.83068699455478
1340
+ - type: manhattan_spearman
1341
+ value: 89.21877175674125
1342
+ - type: euclidean_pearson
1343
+ value: 88.7251052947544
1344
+ - type: euclidean_spearman
1345
+ value: 89.03557389893083
1346
+ - type: main_score
1347
+ value: 89.03555007807218
1348
+ task:
1349
+ type: STS
1350
+ - dataset:
1351
+ config: default
1352
+ name: MTEB STS16
1353
+ revision: 4d8694f8f0e0100860b497b999b3dbed754a0513
1354
+ split: test
1355
+ type: mteb/sts16-sts
1356
+ metrics:
1357
+ - type: cosine_pearson
1358
+ value: 85.63830579034632
1359
+ - type: cosine_spearman
1360
+ value: 86.77353371581373
1361
+ - type: manhattan_pearson
1362
+ value: 86.24830492396637
1363
+ - type: manhattan_spearman
1364
+ value: 86.96754348626189
1365
+ - type: euclidean_pearson
1366
+ value: 86.09837038778359
1367
+ - type: euclidean_spearman
1368
+ value: 86.77353371581373
1369
+ - type: main_score
1370
+ value: 86.77353371581373
1371
+ task:
1372
+ type: STS
1373
+ - dataset:
1374
+ config: en-en
1375
+ name: MTEB STS17 (en-en)
1376
+ revision: af5e6fb845001ecf41f4c1e033ce921939a2a68d
1377
+ split: test
1378
+ type: mteb/sts17-crosslingual-sts
1379
+ metrics:
1380
+ - type: cosine_pearson
1381
+ value: 91.2204675588959
1382
+ - type: cosine_spearman
1383
+ value: 90.66976712249057
1384
+ - type: manhattan_pearson
1385
+ value: 91.11007808242346
1386
+ - type: manhattan_spearman
1387
+ value: 90.51739232964488
1388
+ - type: euclidean_pearson
1389
+ value: 91.19588941007903
1390
+ - type: euclidean_spearman
1391
+ value: 90.66976712249057
1392
+ - type: main_score
1393
+ value: 90.66976712249057
1394
+ task:
1395
+ type: STS
1396
+ - dataset:
1397
+ config: en
1398
+ name: MTEB STS22 (en)
1399
+ revision: eea2b4fe26a775864c896887d910b76a8098ad3f
1400
+ split: test
1401
+ type: mteb/sts22-crosslingual-sts
1402
+ metrics:
1403
+ - type: cosine_pearson
1404
+ value: 69.34416749707114
1405
+ - type: cosine_spearman
1406
+ value: 68.11632448161046
1407
+ - type: manhattan_pearson
1408
+ value: 68.99243488935281
1409
+ - type: manhattan_spearman
1410
+ value: 67.8398546438258
1411
+ - type: euclidean_pearson
1412
+ value: 69.06376010216088
1413
+ - type: euclidean_spearman
1414
+ value: 68.11632448161046
1415
+ - type: main_score
1416
+ value: 68.11632448161046
1417
+ task:
1418
+ type: STS
1419
+ - dataset:
1420
+ config: default
1421
+ name: MTEB STSBenchmark
1422
+ revision: b0fddb56ed78048fa8b90373c8a3cfc37b684831
1423
+ split: test
1424
+ type: mteb/stsbenchmark-sts
1425
+ metrics:
1426
+ - type: cosine_pearson
1427
+ value: 88.10309739429758
1428
+ - type: cosine_spearman
1429
+ value: 88.40520383147418
1430
+ - type: manhattan_pearson
1431
+ value: 88.50753383813232
1432
+ - type: manhattan_spearman
1433
+ value: 88.66382629460927
1434
+ - type: euclidean_pearson
1435
+ value: 88.35050664609376
1436
+ - type: euclidean_spearman
1437
+ value: 88.40520383147418
1438
+ - type: main_score
1439
+ value: 88.40520383147418
1440
+ task:
1441
+ type: STS
1442
+ - dataset:
1443
+ config: default
1444
+ name: MTEB SciDocsRR
1445
+ revision: d3c5e1fc0b855ab6097bf1cda04dd73947d7caab
1446
+ split: test
1447
+ type: mteb/scidocs-reranking
1448
+ metrics:
1449
+ - type: map
1450
+ value: 87.58627126942797
1451
+ - type: mrr
1452
+ value: 97.01098103058887
1453
+ - type: main_score
1454
+ value: 87.58627126942797
1455
+ task:
1456
+ type: Reranking
1457
+ - dataset:
1458
+ config: default
1459
+ name: MTEB SciFact
1460
+ revision: 0228b52cf27578f30900b9e5271d331663a030d7
1461
+ split: test
1462
+ type: mteb/scifact
1463
+ metrics:
1464
+ - type: map_at_1
1465
+ value: 62.883
1466
+ - type: map_at_10
1467
+ value: 75.371
1468
+ - type: map_at_100
1469
+ value: 75.66000000000001
1470
+ - type: map_at_1000
1471
+ value: 75.667
1472
+ - type: map_at_3
1473
+ value: 72.741
1474
+ - type: map_at_5
1475
+ value: 74.74
1476
+ - type: mrr_at_1
1477
+ value: 0.0
1478
+ - type: mrr_at_10
1479
+ value: 0.0
1480
+ - type: mrr_at_100
1481
+ value: 0.0
1482
+ - type: mrr_at_1000
1483
+ value: 0.0
1484
+ - type: mrr_at_3
1485
+ value: 0.0
1486
+ - type: mrr_at_5
1487
+ value: 0.0
1488
+ - type: ndcg_at_1
1489
+ value: 66.0
1490
+ - type: ndcg_at_10
1491
+ value: 80.12700000000001
1492
+ - type: ndcg_at_100
1493
+ value: 81.291
1494
+ - type: ndcg_at_1000
1495
+ value: 81.464
1496
+ - type: ndcg_at_3
1497
+ value: 76.19
1498
+ - type: ndcg_at_5
1499
+ value: 78.827
1500
+ - type: precision_at_1
1501
+ value: 66.0
1502
+ - type: precision_at_10
1503
+ value: 10.567
1504
+ - type: precision_at_100
1505
+ value: 1.117
1506
+ - type: precision_at_1000
1507
+ value: 0.11299999999999999
1508
+ - type: precision_at_3
1509
+ value: 30.333
1510
+ - type: precision_at_5
1511
+ value: 20.133000000000003
1512
+ - type: recall_at_1
1513
+ value: 62.883
1514
+ - type: recall_at_10
1515
+ value: 93.556
1516
+ - type: recall_at_100
1517
+ value: 98.667
1518
+ - type: recall_at_1000
1519
+ value: 100.0
1520
+ - type: recall_at_3
1521
+ value: 83.322
1522
+ - type: recall_at_5
1523
+ value: 89.756
1524
+ - type: main_score
1525
+ value: 80.12700000000001
1526
+ task:
1527
+ type: Retrieval
1528
+ - dataset:
1529
+ config: default
1530
+ name: MTEB SprintDuplicateQuestions
1531
+ revision: d66bd1f72af766a5cc4b0ca5e00c162f89e8cc46
1532
+ split: test
1533
+ type: mteb/sprintduplicatequestions-pairclassification
1534
+ metrics:
1535
+ - type: cos_sim_accuracy
1536
+ value: 99.87524752475248
1537
+ - type: cos_sim_accuracy_threshold
1538
+ value: 74.86587762832642
1539
+ - type: cos_sim_ap
1540
+ value: 97.02222446606328
1541
+ - type: cos_sim_f1
1542
+ value: 93.66197183098592
1543
+ - type: cos_sim_f1_threshold
1544
+ value: 74.74223375320435
1545
+ - type: cos_sim_precision
1546
+ value: 94.23076923076923
1547
+ - type: cos_sim_recall
1548
+ value: 93.10000000000001
1549
+ - type: dot_accuracy
1550
+ value: 99.87524752475248
1551
+ - type: dot_accuracy_threshold
1552
+ value: 74.86587762832642
1553
+ - type: dot_ap
1554
+ value: 97.02222688043362
1555
+ - type: dot_f1
1556
+ value: 93.66197183098592
1557
+ - type: dot_f1_threshold
1558
+ value: 74.74223375320435
1559
+ - type: dot_precision
1560
+ value: 94.23076923076923
1561
+ - type: dot_recall
1562
+ value: 93.10000000000001
1563
+ - type: euclidean_accuracy
1564
+ value: 99.87524752475248
1565
+ - type: euclidean_accuracy_threshold
1566
+ value: 70.9000825881958
1567
+ - type: euclidean_ap
1568
+ value: 97.02222446606329
1569
+ - type: euclidean_f1
1570
+ value: 93.66197183098592
1571
+ - type: euclidean_f1_threshold
1572
+ value: 71.07426524162292
1573
+ - type: euclidean_precision
1574
+ value: 94.23076923076923
1575
+ - type: euclidean_recall
1576
+ value: 93.10000000000001
1577
+ - type: manhattan_accuracy
1578
+ value: 99.87623762376238
1579
+ - type: manhattan_accuracy_threshold
1580
+ value: 3588.5040283203125
1581
+ - type: manhattan_ap
1582
+ value: 97.09194643777883
1583
+ - type: manhattan_f1
1584
+ value: 93.7375745526839
1585
+ - type: manhattan_f1_threshold
1586
+ value: 3664.3760681152344
1587
+ - type: manhattan_precision
1588
+ value: 93.18181818181817
1589
+ - type: manhattan_recall
1590
+ value: 94.3
1591
+ - type: max_accuracy
1592
+ value: 99.87623762376238
1593
+ - type: max_ap
1594
+ value: 97.09194643777883
1595
+ - type: max_f1
1596
+ value: 93.7375745526839
1597
+ task:
1598
+ type: PairClassification
1599
+ - dataset:
1600
+ config: default
1601
+ name: MTEB StackExchangeClustering
1602
+ revision: 6cbc1f7b2bc0622f2e39d2c77fa502909748c259
1603
+ split: test
1604
+ type: mteb/stackexchange-clustering
1605
+ metrics:
1606
+ - type: main_score
1607
+ value: 82.10134099988541
1608
+ - type: v_measure
1609
+ value: 82.10134099988541
1610
+ - type: v_measure_std
1611
+ value: 2.7926349897769533
1612
+ task:
1613
+ type: Clustering
1614
+ - dataset:
1615
+ config: default
1616
+ name: MTEB StackExchangeClusteringP2P
1617
+ revision: 815ca46b2622cec33ccafc3735d572c266efdb44
1618
+ split: test
1619
+ type: mteb/stackexchange-clustering-p2p
1620
+ metrics:
1621
+ - type: main_score
1622
+ value: 48.357450742397404
1623
+ - type: v_measure
1624
+ value: 48.357450742397404
1625
+ - type: v_measure_std
1626
+ value: 1.520118876440547
1627
+ task:
1628
+ type: Clustering
1629
+ - dataset:
1630
+ config: default
1631
+ name: MTEB StackOverflowDupQuestions
1632
+ revision: e185fbe320c72810689fc5848eb6114e1ef5ec69
1633
+ split: test
1634
+ type: mteb/stackoverflowdupquestions-reranking
1635
+ metrics:
1636
+ - type: map
1637
+ value: 55.79277200802986
1638
+ - type: mrr
1639
+ value: 56.742517082590616
1640
+ - type: main_score
1641
+ value: 55.79277200802986
1642
+ task:
1643
+ type: Reranking
1644
+ - dataset:
1645
+ config: default
1646
+ name: MTEB SummEval
1647
+ revision: cda12ad7615edc362dbf25a00fdd61d3b1eaf93c
1648
+ split: test
1649
+ type: mteb/summeval
1650
+ metrics:
1651
+ - type: cosine_spearman
1652
+ value: 30.701215774712693
1653
+ - type: cosine_pearson
1654
+ value: 31.26740037278488
1655
+ - type: dot_spearman
1656
+ value: 30.701215774712693
1657
+ - type: dot_pearson
1658
+ value: 31.267404144879997
1659
+ - type: main_score
1660
+ value: 30.701215774712693
1661
+ task:
1662
+ type: Summarization
1663
+ - dataset:
1664
+ config: default
1665
+ name: MTEB TRECCOVID
1666
+ revision: bb9466bac8153a0349341eb1b22e06409e78ef4e
1667
+ split: test
1668
+ type: mteb/trec-covid
1669
+ metrics:
1670
+ - type: map_at_1
1671
+ value: 0.23800000000000002
1672
+ - type: map_at_10
1673
+ value: 2.31
1674
+ - type: map_at_100
1675
+ value: 15.495000000000001
1676
+ - type: map_at_1000
1677
+ value: 38.829
1678
+ - type: map_at_3
1679
+ value: 0.72
1680
+ - type: map_at_5
1681
+ value: 1.185
1682
+ - type: mrr_at_1
1683
+ value: 0.0
1684
+ - type: mrr_at_10
1685
+ value: 0.0
1686
+ - type: mrr_at_100
1687
+ value: 0.0
1688
+ - type: mrr_at_1000
1689
+ value: 0.0
1690
+ - type: mrr_at_3
1691
+ value: 0.0
1692
+ - type: mrr_at_5
1693
+ value: 0.0
1694
+ - type: ndcg_at_1
1695
+ value: 91.0
1696
+ - type: ndcg_at_10
1697
+ value: 88.442
1698
+ - type: ndcg_at_100
1699
+ value: 71.39
1700
+ - type: ndcg_at_1000
1701
+ value: 64.153
1702
+ - type: ndcg_at_3
1703
+ value: 89.877
1704
+ - type: ndcg_at_5
1705
+ value: 89.562
1706
+ - type: precision_at_1
1707
+ value: 92.0
1708
+ - type: precision_at_10
1709
+ value: 92.60000000000001
1710
+ - type: precision_at_100
1711
+ value: 73.74000000000001
1712
+ - type: precision_at_1000
1713
+ value: 28.222
1714
+ - type: precision_at_3
1715
+ value: 94.0
1716
+ - type: precision_at_5
1717
+ value: 93.60000000000001
1718
+ - type: recall_at_1
1719
+ value: 0.23800000000000002
1720
+ - type: recall_at_10
1721
+ value: 2.428
1722
+ - type: recall_at_100
1723
+ value: 18.099999999999998
1724
+ - type: recall_at_1000
1725
+ value: 60.79599999999999
1726
+ - type: recall_at_3
1727
+ value: 0.749
1728
+ - type: recall_at_5
1729
+ value: 1.238
1730
+ - type: main_score
1731
+ value: 88.442
1732
+ task:
1733
+ type: Retrieval
1734
+ - dataset:
1735
+ config: default
1736
+ name: MTEB Touche2020
1737
+ revision: a34f9a33db75fa0cbb21bb5cfc3dae8dc8bec93f
1738
+ split: test
1739
+ type: mteb/touche2020
1740
+ metrics:
1741
+ - type: map_at_1
1742
+ value: 3.4939999999999998
1743
+ - type: map_at_10
1744
+ value: 12.531999999999998
1745
+ - type: map_at_100
1746
+ value: 19.147
1747
+ - type: map_at_1000
1748
+ value: 20.861
1749
+ - type: map_at_3
1750
+ value: 7.558
1751
+ - type: map_at_5
1752
+ value: 9.49
1753
+ - type: mrr_at_1
1754
+ value: 0.0
1755
+ - type: mrr_at_10
1756
+ value: 0.0
1757
+ - type: mrr_at_100
1758
+ value: 0.0
1759
+ - type: mrr_at_1000
1760
+ value: 0.0
1761
+ - type: mrr_at_3
1762
+ value: 0.0
1763
+ - type: mrr_at_5
1764
+ value: 0.0
1765
+ - type: ndcg_at_1
1766
+ value: 47.959
1767
+ - type: ndcg_at_10
1768
+ value: 31.781
1769
+ - type: ndcg_at_100
1770
+ value: 42.131
1771
+ - type: ndcg_at_1000
1772
+ value: 53.493
1773
+ - type: ndcg_at_3
1774
+ value: 39.204
1775
+ - type: ndcg_at_5
1776
+ value: 34.635
1777
+ - type: precision_at_1
1778
+ value: 48.980000000000004
1779
+ - type: precision_at_10
1780
+ value: 27.143
1781
+ - type: precision_at_100
1782
+ value: 8.224
1783
+ - type: precision_at_1000
1784
+ value: 1.584
1785
+ - type: precision_at_3
1786
+ value: 38.775999999999996
1787
+ - type: precision_at_5
1788
+ value: 33.061
1789
+ - type: recall_at_1
1790
+ value: 3.4939999999999998
1791
+ - type: recall_at_10
1792
+ value: 18.895
1793
+ - type: recall_at_100
1794
+ value: 50.192
1795
+ - type: recall_at_1000
1796
+ value: 85.167
1797
+ - type: recall_at_3
1798
+ value: 8.703
1799
+ - type: recall_at_5
1800
+ value: 11.824
1801
+ - type: main_score
1802
+ value: 31.781
1803
+ task:
1804
+ type: Retrieval
1805
+ - dataset:
1806
+ config: default
1807
+ name: MTEB ToxicConversationsClassification
1808
+ revision: edfaf9da55d3dd50d43143d90c1ac476895ae6de
1809
+ split: test
1810
+ type: mteb/toxic_conversations_50k
1811
+ metrics:
1812
+ - type: accuracy
1813
+ value: 92.7402
1814
+ - type: accuracy_stderr
1815
+ value: 1.020764595781027
1816
+ - type: ap
1817
+ value: 44.38594756333084
1818
+ - type: ap_stderr
1819
+ value: 1.817150701258273
1820
+ - type: f1
1821
+ value: 79.95699280019547
1822
+ - type: f1_stderr
1823
+ value: 1.334582498702029
1824
+ - type: main_score
1825
+ value: 92.7402
1826
+ task:
1827
+ type: Classification
1828
+ - dataset:
1829
+ config: default
1830
+ name: MTEB TweetSentimentExtractionClassification
1831
+ revision: d604517c81ca91fe16a244d1248fc021f9ecee7a
1832
+ split: test
1833
+ type: mteb/tweet_sentiment_extraction
1834
+ metrics:
1835
+ - type: accuracy
1836
+ value: 80.86870401810978
1837
+ - type: accuracy_stderr
1838
+ value: 0.22688467782004712
1839
+ - type: f1
1840
+ value: 81.1829040745744
1841
+ - type: f1_stderr
1842
+ value: 0.19774920574849694
1843
+ - type: main_score
1844
+ value: 80.86870401810978
1845
+ task:
1846
+ type: Classification
1847
+ - dataset:
1848
+ config: default
1849
+ name: MTEB TwentyNewsgroupsClustering
1850
+ revision: 6125ec4e24fa026cec8a478383ee943acfbd5449
1851
+ split: test
1852
+ type: mteb/twentynewsgroups-clustering
1853
+ metrics:
1854
+ - type: main_score
1855
+ value: 64.82048869927482
1856
+ - type: v_measure
1857
+ value: 64.82048869927482
1858
+ - type: v_measure_std
1859
+ value: 0.9170394252450564
1860
+ task:
1861
+ type: Clustering
1862
+ - dataset:
1863
+ config: default
1864
+ name: MTEB TwitterSemEval2015
1865
+ revision: 70970daeab8776df92f5ea462b6173c0b46fd2d1
1866
+ split: test
1867
+ type: mteb/twittersemeval2015-pairclassification
1868
+ metrics:
1869
+ - type: cos_sim_accuracy
1870
+ value: 88.44251057996067
1871
+ - type: cos_sim_accuracy_threshold
1872
+ value: 70.2150285243988
1873
+ - type: cos_sim_ap
1874
+ value: 81.11422351199913
1875
+ - type: cos_sim_f1
1876
+ value: 73.71062868615887
1877
+ - type: cos_sim_f1_threshold
1878
+ value: 66.507488489151
1879
+ - type: cos_sim_precision
1880
+ value: 70.2799712849964
1881
+ - type: cos_sim_recall
1882
+ value: 77.4934036939314
1883
+ - type: dot_accuracy
1884
+ value: 88.44251057996067
1885
+ - type: dot_accuracy_threshold
1886
+ value: 70.2150285243988
1887
+ - type: dot_ap
1888
+ value: 81.11420529068658
1889
+ - type: dot_f1
1890
+ value: 73.71062868615887
1891
+ - type: dot_f1_threshold
1892
+ value: 66.50749444961548
1893
+ - type: dot_precision
1894
+ value: 70.2799712849964
1895
+ - type: dot_recall
1896
+ value: 77.4934036939314
1897
+ - type: euclidean_accuracy
1898
+ value: 88.44251057996067
1899
+ - type: euclidean_accuracy_threshold
1900
+ value: 77.18156576156616
1901
+ - type: euclidean_ap
1902
+ value: 81.11422421732487
1903
+ - type: euclidean_f1
1904
+ value: 73.71062868615887
1905
+ - type: euclidean_f1_threshold
1906
+ value: 81.84436559677124
1907
+ - type: euclidean_precision
1908
+ value: 70.2799712849964
1909
+ - type: euclidean_recall
1910
+ value: 77.4934036939314
1911
+ - type: manhattan_accuracy
1912
+ value: 88.26369434344639
1913
+ - type: manhattan_accuracy_threshold
1914
+ value: 3837.067413330078
1915
+ - type: manhattan_ap
1916
+ value: 80.81442360477725
1917
+ - type: manhattan_f1
1918
+ value: 73.39883099117024
1919
+ - type: manhattan_f1_threshold
1920
+ value: 4098.833847045898
1921
+ - type: manhattan_precision
1922
+ value: 69.41896024464832
1923
+ - type: manhattan_recall
1924
+ value: 77.86279683377309
1925
+ - type: max_accuracy
1926
+ value: 88.44251057996067
1927
+ - type: max_ap
1928
+ value: 81.11422421732487
1929
+ - type: max_f1
1930
+ value: 73.71062868615887
1931
+ task:
1932
+ type: PairClassification
1933
+ - dataset:
1934
+ config: default
1935
+ name: MTEB TwitterURLCorpus
1936
+ revision: 8b6510b0b1fa4e4c4f879467980e9be563ec1cdf
1937
+ split: test
1938
+ type: mteb/twitterurlcorpus-pairclassification
1939
+ metrics:
1940
+ - type: cos_sim_accuracy
1941
+ value: 90.03182365040556
1942
+ - type: cos_sim_accuracy_threshold
1943
+ value: 64.46443796157837
1944
+ - type: cos_sim_ap
1945
+ value: 87.86649113691112
1946
+ - type: cos_sim_f1
1947
+ value: 80.45644844577821
1948
+ - type: cos_sim_f1_threshold
1949
+ value: 61.40774488449097
1950
+ - type: cos_sim_precision
1951
+ value: 77.54052702992216
1952
+ - type: cos_sim_recall
1953
+ value: 83.60024638127503
1954
+ - type: dot_accuracy
1955
+ value: 90.03182365040556
1956
+ - type: dot_accuracy_threshold
1957
+ value: 64.46444988250732
1958
+ - type: dot_ap
1959
+ value: 87.86649011954319
1960
+ - type: dot_f1
1961
+ value: 80.45644844577821
1962
+ - type: dot_f1_threshold
1963
+ value: 61.407750844955444
1964
+ - type: dot_precision
1965
+ value: 77.54052702992216
1966
+ - type: dot_recall
1967
+ value: 83.60024638127503
1968
+ - type: euclidean_accuracy
1969
+ value: 90.03182365040556
1970
+ - type: euclidean_accuracy_threshold
1971
+ value: 84.30368900299072
1972
+ - type: euclidean_ap
1973
+ value: 87.86649114275045
1974
+ - type: euclidean_f1
1975
+ value: 80.45644844577821
1976
+ - type: euclidean_f1_threshold
1977
+ value: 87.8547191619873
1978
+ - type: euclidean_precision
1979
+ value: 77.54052702992216
1980
+ - type: euclidean_recall
1981
+ value: 83.60024638127503
1982
+ - type: manhattan_accuracy
1983
+ value: 89.99883572010712
1984
+ - type: manhattan_accuracy_threshold
1985
+ value: 4206.838607788086
1986
+ - type: manhattan_ap
1987
+ value: 87.8600826607838
1988
+ - type: manhattan_f1
1989
+ value: 80.44054508120217
1990
+ - type: manhattan_f1_threshold
1991
+ value: 4372.755432128906
1992
+ - type: manhattan_precision
1993
+ value: 78.08219178082192
1994
+ - type: manhattan_recall
1995
+ value: 82.94579611949491
1996
+ - type: max_accuracy
1997
+ value: 90.03182365040556
1998
+ - type: max_ap
1999
+ value: 87.86649114275045
2000
+ - type: max_f1
2001
+ value: 80.45644844577821
2002
+ task:
2003
+ type: PairClassification
2004
+ language:
2005
+ - en
2006
+ license: cc-by-nc-4.0
2007
+ library_name: transformers
2008
+ ---
2009
+ ## Introduction
2010
+ We present NV-Embed-v2, a generalist embedding model that ranks No. 1 on the Massive Text Embedding Benchmark ([MTEB benchmark](https://huggingface.co/spaces/mteb/leaderboard))(as of Aug 30, 2024) with a score of 72.31 across 56 text embedding tasks. It also holds the No. 1 in the retrieval sub-category (a score of 62.65 across 15 tasks) in the leaderboard, which is essential to the development of RAG technology.
2011
+
2012
+ NV-Embed-v2 presents several new designs, including having the LLM attend to latent vectors for better pooled embedding output, and demonstrating a two-staged instruction tuning method to enhance the accuracy of both retrieval and non-retrieval tasks. Additionally, NV-Embed-v2 incorporates a novel hard-negative mining methods that take into account the positive relevance score for better false negatives removal.
2013
+
2014
+ For more technical details, refer to our paper: [NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models](https://arxiv.org/pdf/2405.17428).
2015
+
2016
+ ## Model Details
2017
+ - Base Decoder-only LLM: [Mistral-7B-v0.1](https://huggingface.co/mistralai/Mistral-7B-v0.1)
2018
+ - Pooling Type: Latent-Attention
2019
+ - Embedding Dimension: 4096
2020
+
2021
+ ## How to use
2022
+
2023
+ Here is an example of how to encode queries and passages using Huggingface-transformer and Sentence-transformer. Please find the required package version [here](https://huggingface.co/nvidia/NV-Embed-v2#2-required-packages).
2024
+
2025
+ ### Usage (HuggingFace Transformers)
2026
+
2027
+ ```python
2028
+ import torch
2029
+ import torch.nn.functional as F
2030
+ from transformers import AutoTokenizer, AutoModel
2031
+
2032
+ # Each query needs to be accompanied by an corresponding instruction describing the task.
2033
+ task_name_to_instruct = {"example": "Given a question, retrieve passages that answer the question",}
2034
+
2035
+ query_prefix = "Instruct: "+task_name_to_instruct["example"]+"\nQuery: "
2036
+ queries = [
2037
+ 'are judo throws allowed in wrestling?',
2038
+ 'how to become a radiology technician in michigan?'
2039
+ ]
2040
+
2041
+ # No instruction needed for retrieval passages
2042
+ passage_prefix = ""
2043
+ passages = [
2044
+ "Since you're reading this, you are probably someone from a judo background or someone who is just wondering how judo techniques can be applied under wrestling rules. So without further ado, let's get to the question. Are Judo throws allowed in wrestling? Yes, judo throws are allowed in freestyle and folkstyle wrestling. You only need to be careful to follow the slam rules when executing judo throws. In wrestling, a slam is lifting and returning an opponent to the mat with unnecessary force.",
2045
+ "Below are the basic steps to becoming a radiologic technologist in Michigan:Earn a high school diploma. As with most careers in health care, a high school education is the first step to finding entry-level employment. Taking classes in math and science, such as anatomy, biology, chemistry, physiology, and physics, can help prepare students for their college studies and future careers.Earn an associate degree. Entry-level radiologic positions typically require at least an Associate of Applied Science. Before enrolling in one of these degree programs, students should make sure it has been properly accredited by the Joint Review Committee on Education in Radiologic Technology (JRCERT).Get licensed or certified in the state of Michigan."
2046
+ ]
2047
+
2048
+ # load model with tokenizer
2049
+ model = AutoModel.from_pretrained('nvidia/NV-Embed-v2', trust_remote_code=True)
2050
+
2051
+ # get the embeddings
2052
+ max_length = 32768
2053
+ query_embeddings = model.encode(queries, instruction=query_prefix, max_length=max_length)
2054
+ passage_embeddings = model.encode(passages, instruction=passage_prefix, max_length=max_length)
2055
+
2056
+ # normalize embeddings
2057
+ query_embeddings = F.normalize(query_embeddings, p=2, dim=1)
2058
+ passage_embeddings = F.normalize(passage_embeddings, p=2, dim=1)
2059
+
2060
+ # get the embeddings with DataLoader (spliting the datasets into multiple mini-batches)
2061
+ # batch_size=2
2062
+ # query_embeddings = model._do_encode(queries, batch_size=batch_size, instruction=query_prefix, max_length=max_length, num_workers=32, return_numpy=True)
2063
+ # passage_embeddings = model._do_encode(passages, batch_size=batch_size, instruction=passage_prefix, max_length=max_length, num_workers=32, return_numpy=True)
2064
+
2065
+ scores = (query_embeddings @ passage_embeddings.T) * 100
2066
+ print(scores.tolist())
2067
+ # [[87.42693328857422, 0.46283677220344543], [0.965264618396759, 86.03721618652344]]
2068
+ ```
2069
+
2070
+
2071
+ ### Usage (Sentence-Transformers)
2072
+
2073
+ ```python
2074
+ import torch
2075
+ from sentence_transformers import SentenceTransformer
2076
+
2077
+ # Each query needs to be accompanied by an corresponding instruction describing the task.
2078
+ task_name_to_instruct = {"example": "Given a question, retrieve passages that answer the question",}
2079
+
2080
+ query_prefix = "Instruct: "+task_name_to_instruct["example"]+"\nQuery: "
2081
+ queries = [
2082
+ 'are judo throws allowed in wrestling?',
2083
+ 'how to become a radiology technician in michigan?'
2084
+ ]
2085
+
2086
+ # No instruction needed for retrieval passages
2087
+ passages = [
2088
+ "Since you're reading this, you are probably someone from a judo background or someone who is just wondering how judo techniques can be applied under wrestling rules. So without further ado, let's get to the question. Are Judo throws allowed in wrestling? Yes, judo throws are allowed in freestyle and folkstyle wrestling. You only need to be careful to follow the slam rules when executing judo throws. In wrestling, a slam is lifting and returning an opponent to the mat with unnecessary force.",
2089
+ "Below are the basic steps to becoming a radiologic technologist in Michigan:Earn a high school diploma. As with most careers in health care, a high school education is the first step to finding entry-level employment. Taking classes in math and science, such as anatomy, biology, chemistry, physiology, and physics, can help prepare students for their college studies and future careers.Earn an associate degree. Entry-level radiologic positions typically require at least an Associate of Applied Science. Before enrolling in one of these degree programs, students should make sure it has been properly accredited by the Joint Review Committee on Education in Radiologic Technology (JRCERT).Get licensed or certified in the state of Michigan."
2090
+ ]
2091
+
2092
+ # load model with tokenizer
2093
+ model = SentenceTransformer('nvidia/NV-Embed-v2', trust_remote_code=True)
2094
+ model.max_seq_length = 32768
2095
+ model.tokenizer.padding_side="right"
2096
+
2097
+ def add_eos(input_examples):
2098
+ input_examples = [input_example + model.tokenizer.eos_token for input_example in input_examples]
2099
+ return input_examples
2100
+
2101
+ # get the embeddings
2102
+ batch_size = 2
2103
+ query_embeddings = model.encode(add_eos(queries), batch_size=batch_size, prompt=query_prefix, normalize_embeddings=True)
2104
+ passage_embeddings = model.encode(add_eos(passages), batch_size=batch_size, normalize_embeddings=True)
2105
+
2106
+ scores = (query_embeddings @ passage_embeddings.T) * 100
2107
+ print(scores.tolist())
2108
+ ```
2109
+
2110
+ ## License
2111
+ This model should not be used for any commercial purpose. Refer the [license](https://spdx.org/licenses/CC-BY-NC-4.0) for the detailed terms.
2112
+
2113
+ For commercial purpose, we recommend you to use the models of [NeMo Retriever Microservices (NIMs)](https://build.nvidia.com/explore/retrieval).
2114
+
2115
+
2116
+ ## Correspondence to
2117
+ Chankyu Lee (chankyul@nvidia.com), Rajarshi Roy (rajarshir@nvidia.com), Wei Ping (wping@nvidia.com)
2118
+
2119
+
2120
+ ## Citation
2121
+ If you find this code useful in your research, please consider citing:
2122
+
2123
+ ```bibtex
2124
+ @article{lee2024nv,
2125
+ title={NV-Embed: Improved Techniques for Training LLMs as Generalist Embedding Models},
2126
+ author={Lee, Chankyu and Roy, Rajarshi and Xu, Mengyao and Raiman, Jonathan and Shoeybi, Mohammad and Catanzaro, Bryan and Ping, Wei},
2127
+ journal={arXiv preprint arXiv:2405.17428},
2128
+ year={2024}
2129
+ }
2130
+ ```
2131
+ ```bibtex
2132
+ @article{moreira2024nv,
2133
+ title={NV-Retriever: Improving text embedding models with effective hard-negative mining},
2134
+ author={Moreira, Gabriel de Souza P and Osmulski, Radek and Xu, Mengyao and Ak, Ronay and Schifferer, Benedikt and Oldridge, Even},
2135
+ journal={arXiv preprint arXiv:2407.15831},
2136
+ year={2024}
2137
+ }
2138
+ ```
2139
+
2140
+
2141
+ ## Troubleshooting
2142
+
2143
+ #### 1. Instruction template for MTEB benchmarks
2144
+
2145
+ For MTEB sub-tasks for retrieval, STS, summarization, please use the instruction prefix template in [instructions.json](https://huggingface.co/nvidia/NV-Embed-v2/blob/main/instructions.json). For classification, clustering and reranking, please use the instructions provided in Table. 7 in [NV-Embed paper](https://arxiv.org/pdf/2405.17428).
2146
+
2147
+ #### 2. Required Packages
2148
+
2149
+ If you have trouble, try installing the python packages as below
2150
+ ```python
2151
+ pip uninstall -y transformer-engine
2152
+ pip install torch==2.2.0
2153
+ pip install transformers==4.42.4
2154
+ pip install flash-attn==2.2.0
2155
+ pip install sentence-transformers==2.7.0
2156
+ ```
2157
+
2158
+ #### 3. How to enable Multi-GPU (Note, this is the case for HuggingFace Transformers)
2159
+ ```python
2160
+ from transformers import AutoModel
2161
+ from torch.nn import DataParallel
2162
+
2163
+ embedding_model = AutoModel.from_pretrained("nvidia/NV-Embed-v2")
2164
+ for module_key, module in embedding_model._modules.items():
2165
+ embedding_model._modules[module_key] = DataParallel(module)
2166
+ ```
2167
+
2168
+ #### 4. Fixing "nvidia/NV-Embed-v2 is not the path to a directory containing a file named config.json"
2169
+
2170
+ Switch to your local model path,and open config.json and change the value of **"_name_or_path"** and replace it with your local model path.
2171
+
2172
+
2173
+ #### 5. Access to model nvidia/NV-Embed-v2 is restricted. You must be authenticated to access it
2174
+
2175
+ Use your huggingface access [token](https://huggingface.co/settings/tokens) to execute *"huggingface-cli login"*.
2176
+
2177
+ #### 6. How to resolve slight mismatch in Sentence transformer results.
2178
+
2179
+ A slight mismatch in the Sentence Transformer implementation is caused by a discrepancy in the calculation of the instruction prefix length within the Sentence Transformer package.
2180
+
2181
+ To fix this issue, you need to build the Sentence Transformer package from source, making the necessary modification in this [line](https://github.com/UKPLab/sentence-transformers/blob/v2.7-release/sentence_transformers/SentenceTransformer.py#L353) as below.
2182
+ ```python
2183
+ git clone https://github.com/UKPLab/sentence-transformers.git
2184
+ cd sentence-transformers
2185
+ git checkout v2.7-release
2186
+ # Modify L353 in SentenceTransformer.py to **'extra_features["prompt_length"] = tokenized_prompt["input_ids"].shape[-1]'**.
2187
+ pip install -e .
2188
+ ```
config.json ADDED
@@ -0,0 +1,101 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_name_or_path": "nvidia/NV-Embed-v2",
3
+ "add_eos": true,
4
+ "add_pad_token": true,
5
+ "architectures": [
6
+ "NVEmbedModel"
7
+ ],
8
+ "auto_map": {
9
+ "AutoConfig": "configuration_nvembed.NVEmbedConfig",
10
+ "AutoModel": "modeling_nvembed.NVEmbedModel"
11
+ },
12
+ "hidden_size": 4096,
13
+ "is_mask_instruction": true,
14
+ "latent_attention_config": {
15
+ "model_type": "latent_attention"
16
+ },
17
+ "mask_type": "b",
18
+ "model_type": "nvembed",
19
+ "padding_side": "right",
20
+ "text_config": {
21
+ "_name_or_path": "nvidia/NV-Embed-v2",
22
+ "add_cross_attention": false,
23
+ "architectures": [
24
+ "MistralModel"
25
+ ],
26
+ "attention_dropout": 0.0,
27
+ "bad_words_ids": null,
28
+ "begin_suppress_tokens": null,
29
+ "bos_token_id": 1,
30
+ "chunk_size_feed_forward": 0,
31
+ "cross_attention_hidden_size": null,
32
+ "decoder_start_token_id": null,
33
+ "diversity_penalty": 0.0,
34
+ "do_sample": false,
35
+ "early_stopping": false,
36
+ "encoder_no_repeat_ngram_size": 0,
37
+ "eos_token_id": 2,
38
+ "exponential_decay_length_penalty": null,
39
+ "finetuning_task": null,
40
+ "forced_bos_token_id": null,
41
+ "forced_eos_token_id": null,
42
+ "hidden_act": "silu",
43
+ "hidden_size": 4096,
44
+ "id2label": {
45
+ "0": "LABEL_0",
46
+ "1": "LABEL_1"
47
+ },
48
+ "initializer_range": 0.02,
49
+ "intermediate_size": 14336,
50
+ "is_decoder": false,
51
+ "is_encoder_decoder": false,
52
+ "label2id": {
53
+ "LABEL_0": 0,
54
+ "LABEL_1": 1
55
+ },
56
+ "length_penalty": 1.0,
57
+ "max_length": 20,
58
+ "max_position_embeddings": 32768,
59
+ "min_length": 0,
60
+ "model_type": "bidir_mistral",
61
+ "no_repeat_ngram_size": 0,
62
+ "num_attention_heads": 32,
63
+ "num_beam_groups": 1,
64
+ "num_beams": 1,
65
+ "num_hidden_layers": 32,
66
+ "num_key_value_heads": 8,
67
+ "num_return_sequences": 1,
68
+ "output_attentions": false,
69
+ "output_hidden_states": false,
70
+ "output_scores": false,
71
+ "pad_token_id": null,
72
+ "prefix": null,
73
+ "problem_type": null,
74
+ "pruned_heads": {},
75
+ "remove_invalid_values": false,
76
+ "repetition_penalty": 1.0,
77
+ "return_dict": true,
78
+ "return_dict_in_generate": false,
79
+ "rms_norm_eps": 1e-05,
80
+ "rope_theta": 10000.0,
81
+ "sep_token_id": null,
82
+ "sliding_window": 4096,
83
+ "suppress_tokens": null,
84
+ "task_specific_params": null,
85
+ "temperature": 1.0,
86
+ "tf_legacy_loss": false,
87
+ "tie_encoder_decoder": false,
88
+ "tie_word_embeddings": false,
89
+ "tokenizer_class": null,
90
+ "top_k": 50,
91
+ "top_p": 1.0,
92
+ "torch_dtype": "float32",
93
+ "torchscript": false,
94
+ "typical_p": 1.0,
95
+ "use_bfloat16": false,
96
+ "use_cache": true,
97
+ "vocab_size": 32000
98
+ },
99
+ "torch_dtype": "float16",
100
+ "transformers_version": "4.42.4"
101
+ }
config_sentence_transformers.json ADDED
@@ -0,0 +1,27 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "__version__": {
3
+ "sentence_transformers": "4.2.0",
4
+ "transformers": "4.47.0",
5
+ "pytorch": "2.5.1+cu12"
6
+ },
7
+ "prompts": {
8
+ "Banking77Classification": "Instruct: Given a question, please describe the intent of this question. \n Question: ",
9
+ "MTOPIntentClassification": "Instruct: Given a question, please describe the intent of this question. \n Question: ",
10
+ "TweetSentimentClassification": "Classify the sentiment of a given tweet as either positive, negative, or neutral.",
11
+ "BiorxivClusteringP2P.v2": "Identify the main category of Biorxiv papers based on the titles and abstracts",
12
+ "BiorxivClusteringS2S.v2": "Identify the main category of Biorxiv papers based on the titles",
13
+ "TwentyNewsgroupsClustering": "Identify the topic or theme of the given news articles",
14
+ "FiQA2018": {
15
+ "query": "Given a financial question, retrieve relevant passages that answer the query"
16
+ },
17
+ "SciFact": {
18
+ "query": "Given a scientific claim, retrieve documents that support or refute the claim"
19
+ },
20
+ "NFCorpus": {
21
+ "query": "Given a question, retrieve relevant documents that answer the question"
22
+ }
23
+ },
24
+ "default_prompt_name": null,
25
+ "model_type": "SparseEncoder",
26
+ "similarity_fn_name": "dot"
27
+ }
configuration_nvembed.py ADDED
@@ -0,0 +1,92 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+
2
+ from typing import Literal
3
+ from transformers import AutoConfig
4
+ from transformers.configuration_utils import PretrainedConfig
5
+ from transformers.models.auto import CONFIG_MAPPING
6
+ from transformers.models.mistral import MistralConfig
7
+
8
+ NVEMBED_TYPE = "nvembed"
9
+ LATENT_ATTENTION_TYPE = "latent_attention"
10
+ BIDIR_MISTRAL_TYPE = "bidir_mistral"
11
+
12
+ class NVEmbedConfig(PretrainedConfig):
13
+ model_type = "nvembed"
14
+ is_composition = False
15
+
16
+ def __init__(
17
+ self,
18
+ latent_attention_config=None,
19
+ text_config=None,
20
+ padding_side: Literal["right", "left"]="right",
21
+ add_pad_token: bool=True,
22
+ is_mask_instruction: bool = True,
23
+ add_eos: bool=True,
24
+ mask_type: str="b",
25
+ **kwargs,
26
+ ):
27
+ if isinstance(latent_attention_config, dict):
28
+ latent_attention_config["model_type"] = (
29
+ latent_attention_config["model_type"] if "model_type" in latent_attention_config else LATENT_ATTENTION_TYPE
30
+ )
31
+ latent_attention_config = CONFIG_MAPPING[latent_attention_config["model_type"]](**latent_attention_config)
32
+ elif latent_attention_config is None:
33
+ latent_attention_config = CONFIG_MAPPING[LATENT_ATTENTION_TYPE]()
34
+
35
+ self.latent_attention_config = latent_attention_config
36
+
37
+ if isinstance(text_config, dict):
38
+ text_config["model_type"] = text_config["model_type"] if "model_type" in text_config else "llama"
39
+ text_config = CONFIG_MAPPING[text_config["model_type"]](**text_config)
40
+ elif text_config is None:
41
+ text_config = None
42
+
43
+ self.text_config = text_config
44
+ self.padding_side = padding_side
45
+ self.is_mask_instruction = is_mask_instruction
46
+ self.add_pad_token = add_pad_token
47
+ self.add_eos = add_eos
48
+ self.mask_type = mask_type
49
+ if "hidden_size" in kwargs:
50
+ self.hidden_size = kwargs["hidden_size"]
51
+ else:
52
+ self.hidden_size = 4096
53
+
54
+ super().__init__(**kwargs)
55
+
56
+
57
+ class LatentAttentionConfig(PretrainedConfig):
58
+ model_type = LATENT_ATTENTION_TYPE
59
+ is_composition = False
60
+ _name_or_path = "latent_attention"
61
+
62
+ def __init__(
63
+ self,
64
+ num_latents_value: int=512,
65
+ num_cross_heads: int=8,
66
+ output_normalize: bool=True,
67
+ hidden_dim: int=4096,
68
+ latent_dim: int=4096,
69
+ cross_dim_head: int=4096,
70
+ **kwargs,
71
+ ):
72
+ self.num_latents_value = num_latents_value
73
+ self.num_cross_heads = num_cross_heads
74
+ self.output_normalize = output_normalize
75
+ self.hidden_dim = hidden_dim
76
+ self.latent_dim = latent_dim
77
+ self.cross_dim_head = cross_dim_head
78
+
79
+ super().__init__(**kwargs)
80
+
81
+
82
+ class BidirectionalMistralConfig(MistralConfig):
83
+ model_type = BIDIR_MISTRAL_TYPE
84
+ keys_to_ignore_at_inference = ["past_key_values"]
85
+
86
+ AutoConfig.register(NVEMBED_TYPE, NVEmbedConfig)
87
+ AutoConfig.register(LATENT_ATTENTION_TYPE, LatentAttentionConfig)
88
+ AutoConfig.register(BIDIR_MISTRAL_TYPE, BidirectionalMistralConfig)
89
+
90
+ NVEmbedConfig.register_for_auto_class()
91
+ LatentAttentionConfig.register_for_auto_class()
92
+ BidirectionalMistralConfig.register_for_auto_class()
instructions.json ADDED
@@ -0,0 +1,99 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "ClimateFEVER":
3
+ {
4
+ "query": "Given a claim about climate change, retrieve documents that support or refute the claim",
5
+ "corpus": ""
6
+ },
7
+ "HotpotQA":
8
+ {
9
+ "query": "Given a multi-hop question, retrieve documents that can help answer the question",
10
+ "corpus": ""
11
+ },
12
+ "FEVER":
13
+ {
14
+ "query": "Given a claim, retrieve documents that support or refute the claim",
15
+ "corpus": ""
16
+ },
17
+ "MSMARCO":
18
+ {
19
+ "query": "Given a web search query, retrieve relevant passages that answer the query",
20
+ "corpus": ""
21
+ },
22
+ "DBPedia":
23
+ {
24
+ "query": "Given a query, retrieve relevant entity descriptions from DBPedia",
25
+ "corpus": ""
26
+ },
27
+ "NQ":
28
+ {
29
+ "query": "Given a question, retrieve passages that answer the question",
30
+ "corpus": ""
31
+ },
32
+ "QuoraRetrieval":
33
+ {
34
+ "query": "Given a question, retrieve questions that are semantically equivalent to the given question",
35
+ "corpus": "Given a question, retrieve questions that are semantically equivalent to the given question"
36
+ },
37
+ "SCIDOCS":
38
+ {
39
+ "query": "Given a scientific paper title, retrieve paper abstracts that are cited by the given paper",
40
+ "corpus": ""
41
+ },
42
+ "TRECCOVID":
43
+ {
44
+ "query": "Given a query on COVID-19, retrieve documents that answer the query",
45
+ "corpus": ""
46
+ },
47
+ "Touche2020":
48
+ {
49
+ "query": "Given a question, retrieve passages that answer the question",
50
+ "corpus": ""
51
+ },
52
+ "SciFact":
53
+ {
54
+ "query": "Given a scientific claim, retrieve documents that support or refute the claim",
55
+ "corpus": ""
56
+ },
57
+ "NFCorpus":
58
+ {
59
+ "query": "Given a question, retrieve relevant documents that answer the question",
60
+ "corpus": ""
61
+ },
62
+ "ArguAna":
63
+ {
64
+ "query": "Given a claim, retrieve documents that support or refute the claim",
65
+ "corpus": ""
66
+ },
67
+ "FiQA2018":
68
+ {
69
+ "query": "Given a financial question, retrieve relevant passages that answer the query",
70
+ "corpus": ""
71
+ },
72
+ "STS":
73
+ {
74
+ "text": "Retrieve semantically similar text"
75
+ },
76
+ "SUMM":
77
+ {
78
+ "text": "Given a news summary, retrieve other semantically similar summaries"
79
+ }
80
+ ,
81
+ "Banking77Classification": {
82
+ "text": "Instruct: Given a question, please describe the intent of this question. \n Question: "
83
+ },
84
+ "MTOPIntentClassification": {
85
+ "text": "Instruct: Given a question, please describe the intent of this question. \n Question: "
86
+ },
87
+ "TweetSentimentClassification": {
88
+ "text": "Classify the sentiment of a given tweet as either positive, negative, or neutral."
89
+ },
90
+ "BiorxivClusteringP2P.v2": {
91
+ "text": "Identify the main category of Biorxiv papers based on the titles and abstracts"
92
+ },
93
+ "BiorxivClusteringS2S.v2": {
94
+ "text": "Identify the main category of Biorxiv papers based on the titles"
95
+ },
96
+ "TwentyNewsgroupsClustering.v2": {
97
+ "text": "Identify the topic or theme of the given news articles"
98
+ }
99
+ }
model-00001-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:0ce5651268058d961eaeabd4f65a5cb5d003ac7e0e34b7095658b5d5a4802f6a
3
+ size 4997761248
model-00002-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:bbd7e85b57afbc74fab67e50a572590ce57dde8b5fa76fe7527c42189074d57d
3
+ size 4915917048
model-00003-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:87c90f033107075c9531ed8163d4b087ce77e63596c8510821da15a4d892a85c
3
+ size 4999820296
model-00004-of-00004.safetensors ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:44ff251c6b33ed89101915eb82a92575fd7d7daf9db953205f3bb4b982c4c3f5
3
+ size 788571960
model.safetensors.index.json ADDED
@@ -0,0 +1,311 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "metadata": {
3
+ "total_size": 15702032384
4
+ },
5
+ "weight_map": {
6
+ "embedding_model.embed_tokens.weight": "model-00001-of-00004.safetensors",
7
+ "embedding_model.layers.0.input_layernorm.weight": "model-00001-of-00004.safetensors",
8
+ "embedding_model.layers.0.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
9
+ "embedding_model.layers.0.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
10
+ "embedding_model.layers.0.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
11
+ "embedding_model.layers.0.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
12
+ "embedding_model.layers.0.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
13
+ "embedding_model.layers.0.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
14
+ "embedding_model.layers.0.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
15
+ "embedding_model.layers.0.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
16
+ "embedding_model.layers.1.input_layernorm.weight": "model-00001-of-00004.safetensors",
17
+ "embedding_model.layers.1.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
18
+ "embedding_model.layers.1.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
19
+ "embedding_model.layers.1.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
20
+ "embedding_model.layers.1.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
21
+ "embedding_model.layers.1.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
22
+ "embedding_model.layers.1.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
23
+ "embedding_model.layers.1.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
24
+ "embedding_model.layers.1.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
25
+ "embedding_model.layers.10.input_layernorm.weight": "model-00002-of-00004.safetensors",
26
+ "embedding_model.layers.10.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
27
+ "embedding_model.layers.10.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
28
+ "embedding_model.layers.10.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
29
+ "embedding_model.layers.10.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
30
+ "embedding_model.layers.10.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
31
+ "embedding_model.layers.10.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
32
+ "embedding_model.layers.10.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
33
+ "embedding_model.layers.10.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
34
+ "embedding_model.layers.11.input_layernorm.weight": "model-00002-of-00004.safetensors",
35
+ "embedding_model.layers.11.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
36
+ "embedding_model.layers.11.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
37
+ "embedding_model.layers.11.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
38
+ "embedding_model.layers.11.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
39
+ "embedding_model.layers.11.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
40
+ "embedding_model.layers.11.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
41
+ "embedding_model.layers.11.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
42
+ "embedding_model.layers.11.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
43
+ "embedding_model.layers.12.input_layernorm.weight": "model-00002-of-00004.safetensors",
44
+ "embedding_model.layers.12.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
45
+ "embedding_model.layers.12.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
46
+ "embedding_model.layers.12.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
47
+ "embedding_model.layers.12.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
48
+ "embedding_model.layers.12.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
49
+ "embedding_model.layers.12.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
50
+ "embedding_model.layers.12.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
51
+ "embedding_model.layers.12.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
52
+ "embedding_model.layers.13.input_layernorm.weight": "model-00002-of-00004.safetensors",
53
+ "embedding_model.layers.13.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
54
+ "embedding_model.layers.13.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
55
+ "embedding_model.layers.13.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
56
+ "embedding_model.layers.13.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
57
+ "embedding_model.layers.13.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
58
+ "embedding_model.layers.13.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
59
+ "embedding_model.layers.13.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
60
+ "embedding_model.layers.13.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
61
+ "embedding_model.layers.14.input_layernorm.weight": "model-00002-of-00004.safetensors",
62
+ "embedding_model.layers.14.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
63
+ "embedding_model.layers.14.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
64
+ "embedding_model.layers.14.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
65
+ "embedding_model.layers.14.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
66
+ "embedding_model.layers.14.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
67
+ "embedding_model.layers.14.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
68
+ "embedding_model.layers.14.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
69
+ "embedding_model.layers.14.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
70
+ "embedding_model.layers.15.input_layernorm.weight": "model-00002-of-00004.safetensors",
71
+ "embedding_model.layers.15.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
72
+ "embedding_model.layers.15.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
73
+ "embedding_model.layers.15.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
74
+ "embedding_model.layers.15.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
75
+ "embedding_model.layers.15.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
76
+ "embedding_model.layers.15.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
77
+ "embedding_model.layers.15.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
78
+ "embedding_model.layers.15.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
79
+ "embedding_model.layers.16.input_layernorm.weight": "model-00002-of-00004.safetensors",
80
+ "embedding_model.layers.16.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
81
+ "embedding_model.layers.16.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
82
+ "embedding_model.layers.16.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
83
+ "embedding_model.layers.16.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
84
+ "embedding_model.layers.16.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
85
+ "embedding_model.layers.16.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
86
+ "embedding_model.layers.16.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
87
+ "embedding_model.layers.16.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
88
+ "embedding_model.layers.17.input_layernorm.weight": "model-00002-of-00004.safetensors",
89
+ "embedding_model.layers.17.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
90
+ "embedding_model.layers.17.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
91
+ "embedding_model.layers.17.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
92
+ "embedding_model.layers.17.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
93
+ "embedding_model.layers.17.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
94
+ "embedding_model.layers.17.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
95
+ "embedding_model.layers.17.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
96
+ "embedding_model.layers.17.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
97
+ "embedding_model.layers.18.input_layernorm.weight": "model-00003-of-00004.safetensors",
98
+ "embedding_model.layers.18.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
99
+ "embedding_model.layers.18.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
100
+ "embedding_model.layers.18.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
101
+ "embedding_model.layers.18.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
102
+ "embedding_model.layers.18.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
103
+ "embedding_model.layers.18.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
104
+ "embedding_model.layers.18.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
105
+ "embedding_model.layers.18.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
106
+ "embedding_model.layers.19.input_layernorm.weight": "model-00003-of-00004.safetensors",
107
+ "embedding_model.layers.19.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
108
+ "embedding_model.layers.19.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
109
+ "embedding_model.layers.19.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
110
+ "embedding_model.layers.19.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
111
+ "embedding_model.layers.19.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
112
+ "embedding_model.layers.19.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
113
+ "embedding_model.layers.19.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
114
+ "embedding_model.layers.19.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
115
+ "embedding_model.layers.2.input_layernorm.weight": "model-00001-of-00004.safetensors",
116
+ "embedding_model.layers.2.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
117
+ "embedding_model.layers.2.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
118
+ "embedding_model.layers.2.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
119
+ "embedding_model.layers.2.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
120
+ "embedding_model.layers.2.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
121
+ "embedding_model.layers.2.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
122
+ "embedding_model.layers.2.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
123
+ "embedding_model.layers.2.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
124
+ "embedding_model.layers.20.input_layernorm.weight": "model-00003-of-00004.safetensors",
125
+ "embedding_model.layers.20.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
126
+ "embedding_model.layers.20.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
127
+ "embedding_model.layers.20.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
128
+ "embedding_model.layers.20.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
129
+ "embedding_model.layers.20.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
130
+ "embedding_model.layers.20.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
131
+ "embedding_model.layers.20.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
132
+ "embedding_model.layers.20.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
133
+ "embedding_model.layers.21.input_layernorm.weight": "model-00003-of-00004.safetensors",
134
+ "embedding_model.layers.21.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
135
+ "embedding_model.layers.21.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
136
+ "embedding_model.layers.21.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
137
+ "embedding_model.layers.21.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
138
+ "embedding_model.layers.21.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
139
+ "embedding_model.layers.21.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
140
+ "embedding_model.layers.21.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
141
+ "embedding_model.layers.21.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
142
+ "embedding_model.layers.22.input_layernorm.weight": "model-00003-of-00004.safetensors",
143
+ "embedding_model.layers.22.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
144
+ "embedding_model.layers.22.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
145
+ "embedding_model.layers.22.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
146
+ "embedding_model.layers.22.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
147
+ "embedding_model.layers.22.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
148
+ "embedding_model.layers.22.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
149
+ "embedding_model.layers.22.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
150
+ "embedding_model.layers.22.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
151
+ "embedding_model.layers.23.input_layernorm.weight": "model-00003-of-00004.safetensors",
152
+ "embedding_model.layers.23.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
153
+ "embedding_model.layers.23.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
154
+ "embedding_model.layers.23.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
155
+ "embedding_model.layers.23.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
156
+ "embedding_model.layers.23.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
157
+ "embedding_model.layers.23.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
158
+ "embedding_model.layers.23.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
159
+ "embedding_model.layers.23.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
160
+ "embedding_model.layers.24.input_layernorm.weight": "model-00003-of-00004.safetensors",
161
+ "embedding_model.layers.24.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
162
+ "embedding_model.layers.24.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
163
+ "embedding_model.layers.24.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
164
+ "embedding_model.layers.24.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
165
+ "embedding_model.layers.24.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
166
+ "embedding_model.layers.24.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
167
+ "embedding_model.layers.24.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
168
+ "embedding_model.layers.24.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
169
+ "embedding_model.layers.25.input_layernorm.weight": "model-00003-of-00004.safetensors",
170
+ "embedding_model.layers.25.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
171
+ "embedding_model.layers.25.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
172
+ "embedding_model.layers.25.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
173
+ "embedding_model.layers.25.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
174
+ "embedding_model.layers.25.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
175
+ "embedding_model.layers.25.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
176
+ "embedding_model.layers.25.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
177
+ "embedding_model.layers.25.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
178
+ "embedding_model.layers.26.input_layernorm.weight": "model-00003-of-00004.safetensors",
179
+ "embedding_model.layers.26.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
180
+ "embedding_model.layers.26.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
181
+ "embedding_model.layers.26.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
182
+ "embedding_model.layers.26.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
183
+ "embedding_model.layers.26.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
184
+ "embedding_model.layers.26.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
185
+ "embedding_model.layers.26.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
186
+ "embedding_model.layers.26.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
187
+ "embedding_model.layers.27.input_layernorm.weight": "model-00003-of-00004.safetensors",
188
+ "embedding_model.layers.27.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
189
+ "embedding_model.layers.27.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
190
+ "embedding_model.layers.27.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
191
+ "embedding_model.layers.27.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
192
+ "embedding_model.layers.27.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
193
+ "embedding_model.layers.27.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
194
+ "embedding_model.layers.27.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
195
+ "embedding_model.layers.27.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
196
+ "embedding_model.layers.28.input_layernorm.weight": "model-00003-of-00004.safetensors",
197
+ "embedding_model.layers.28.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
198
+ "embedding_model.layers.28.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
199
+ "embedding_model.layers.28.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
200
+ "embedding_model.layers.28.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
201
+ "embedding_model.layers.28.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
202
+ "embedding_model.layers.28.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
203
+ "embedding_model.layers.28.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
204
+ "embedding_model.layers.28.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
205
+ "embedding_model.layers.29.input_layernorm.weight": "model-00003-of-00004.safetensors",
206
+ "embedding_model.layers.29.mlp.down_proj.weight": "model-00003-of-00004.safetensors",
207
+ "embedding_model.layers.29.mlp.gate_proj.weight": "model-00003-of-00004.safetensors",
208
+ "embedding_model.layers.29.mlp.up_proj.weight": "model-00003-of-00004.safetensors",
209
+ "embedding_model.layers.29.post_attention_layernorm.weight": "model-00003-of-00004.safetensors",
210
+ "embedding_model.layers.29.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
211
+ "embedding_model.layers.29.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
212
+ "embedding_model.layers.29.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
213
+ "embedding_model.layers.29.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
214
+ "embedding_model.layers.3.input_layernorm.weight": "model-00001-of-00004.safetensors",
215
+ "embedding_model.layers.3.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
216
+ "embedding_model.layers.3.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
217
+ "embedding_model.layers.3.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
218
+ "embedding_model.layers.3.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
219
+ "embedding_model.layers.3.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
220
+ "embedding_model.layers.3.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
221
+ "embedding_model.layers.3.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
222
+ "embedding_model.layers.3.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
223
+ "embedding_model.layers.30.input_layernorm.weight": "model-00004-of-00004.safetensors",
224
+ "embedding_model.layers.30.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
225
+ "embedding_model.layers.30.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
226
+ "embedding_model.layers.30.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
227
+ "embedding_model.layers.30.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
228
+ "embedding_model.layers.30.self_attn.k_proj.weight": "model-00003-of-00004.safetensors",
229
+ "embedding_model.layers.30.self_attn.o_proj.weight": "model-00003-of-00004.safetensors",
230
+ "embedding_model.layers.30.self_attn.q_proj.weight": "model-00003-of-00004.safetensors",
231
+ "embedding_model.layers.30.self_attn.v_proj.weight": "model-00003-of-00004.safetensors",
232
+ "embedding_model.layers.31.input_layernorm.weight": "model-00004-of-00004.safetensors",
233
+ "embedding_model.layers.31.mlp.down_proj.weight": "model-00004-of-00004.safetensors",
234
+ "embedding_model.layers.31.mlp.gate_proj.weight": "model-00004-of-00004.safetensors",
235
+ "embedding_model.layers.31.mlp.up_proj.weight": "model-00004-of-00004.safetensors",
236
+ "embedding_model.layers.31.post_attention_layernorm.weight": "model-00004-of-00004.safetensors",
237
+ "embedding_model.layers.31.self_attn.k_proj.weight": "model-00004-of-00004.safetensors",
238
+ "embedding_model.layers.31.self_attn.o_proj.weight": "model-00004-of-00004.safetensors",
239
+ "embedding_model.layers.31.self_attn.q_proj.weight": "model-00004-of-00004.safetensors",
240
+ "embedding_model.layers.31.self_attn.v_proj.weight": "model-00004-of-00004.safetensors",
241
+ "embedding_model.layers.4.input_layernorm.weight": "model-00001-of-00004.safetensors",
242
+ "embedding_model.layers.4.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
243
+ "embedding_model.layers.4.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
244
+ "embedding_model.layers.4.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
245
+ "embedding_model.layers.4.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
246
+ "embedding_model.layers.4.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
247
+ "embedding_model.layers.4.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
248
+ "embedding_model.layers.4.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
249
+ "embedding_model.layers.4.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
250
+ "embedding_model.layers.5.input_layernorm.weight": "model-00001-of-00004.safetensors",
251
+ "embedding_model.layers.5.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
252
+ "embedding_model.layers.5.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
253
+ "embedding_model.layers.5.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
254
+ "embedding_model.layers.5.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
255
+ "embedding_model.layers.5.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
256
+ "embedding_model.layers.5.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
257
+ "embedding_model.layers.5.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
258
+ "embedding_model.layers.5.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
259
+ "embedding_model.layers.6.input_layernorm.weight": "model-00001-of-00004.safetensors",
260
+ "embedding_model.layers.6.mlp.down_proj.weight": "model-00001-of-00004.safetensors",
261
+ "embedding_model.layers.6.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
262
+ "embedding_model.layers.6.mlp.up_proj.weight": "model-00001-of-00004.safetensors",
263
+ "embedding_model.layers.6.post_attention_layernorm.weight": "model-00001-of-00004.safetensors",
264
+ "embedding_model.layers.6.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
265
+ "embedding_model.layers.6.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
266
+ "embedding_model.layers.6.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
267
+ "embedding_model.layers.6.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
268
+ "embedding_model.layers.7.input_layernorm.weight": "model-00002-of-00004.safetensors",
269
+ "embedding_model.layers.7.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
270
+ "embedding_model.layers.7.mlp.gate_proj.weight": "model-00001-of-00004.safetensors",
271
+ "embedding_model.layers.7.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
272
+ "embedding_model.layers.7.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
273
+ "embedding_model.layers.7.self_attn.k_proj.weight": "model-00001-of-00004.safetensors",
274
+ "embedding_model.layers.7.self_attn.o_proj.weight": "model-00001-of-00004.safetensors",
275
+ "embedding_model.layers.7.self_attn.q_proj.weight": "model-00001-of-00004.safetensors",
276
+ "embedding_model.layers.7.self_attn.v_proj.weight": "model-00001-of-00004.safetensors",
277
+ "embedding_model.layers.8.input_layernorm.weight": "model-00002-of-00004.safetensors",
278
+ "embedding_model.layers.8.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
279
+ "embedding_model.layers.8.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
280
+ "embedding_model.layers.8.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
281
+ "embedding_model.layers.8.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
282
+ "embedding_model.layers.8.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
283
+ "embedding_model.layers.8.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
284
+ "embedding_model.layers.8.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
285
+ "embedding_model.layers.8.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
286
+ "embedding_model.layers.9.input_layernorm.weight": "model-00002-of-00004.safetensors",
287
+ "embedding_model.layers.9.mlp.down_proj.weight": "model-00002-of-00004.safetensors",
288
+ "embedding_model.layers.9.mlp.gate_proj.weight": "model-00002-of-00004.safetensors",
289
+ "embedding_model.layers.9.mlp.up_proj.weight": "model-00002-of-00004.safetensors",
290
+ "embedding_model.layers.9.post_attention_layernorm.weight": "model-00002-of-00004.safetensors",
291
+ "embedding_model.layers.9.self_attn.k_proj.weight": "model-00002-of-00004.safetensors",
292
+ "embedding_model.layers.9.self_attn.o_proj.weight": "model-00002-of-00004.safetensors",
293
+ "embedding_model.layers.9.self_attn.q_proj.weight": "model-00002-of-00004.safetensors",
294
+ "embedding_model.layers.9.self_attn.v_proj.weight": "model-00002-of-00004.safetensors",
295
+ "embedding_model.norm.weight": "model-00004-of-00004.safetensors",
296
+ "latent_attention_model.cross_attend_blocks.0.fn.to_kv.weight": "model-00001-of-00004.safetensors",
297
+ "latent_attention_model.cross_attend_blocks.0.fn.to_out.weight": "model-00001-of-00004.safetensors",
298
+ "latent_attention_model.cross_attend_blocks.0.fn.to_q.weight": "model-00001-of-00004.safetensors",
299
+ "latent_attention_model.cross_attend_blocks.0.norm.bias": "model-00001-of-00004.safetensors",
300
+ "latent_attention_model.cross_attend_blocks.0.norm.weight": "model-00001-of-00004.safetensors",
301
+ "latent_attention_model.cross_attend_blocks.0.norm_context.bias": "model-00001-of-00004.safetensors",
302
+ "latent_attention_model.cross_attend_blocks.0.norm_context.weight": "model-00001-of-00004.safetensors",
303
+ "latent_attention_model.cross_attend_blocks.1.fn.net.0.bias": "model-00001-of-00004.safetensors",
304
+ "latent_attention_model.cross_attend_blocks.1.fn.net.0.weight": "model-00001-of-00004.safetensors",
305
+ "latent_attention_model.cross_attend_blocks.1.fn.net.2.bias": "model-00001-of-00004.safetensors",
306
+ "latent_attention_model.cross_attend_blocks.1.fn.net.2.weight": "model-00001-of-00004.safetensors",
307
+ "latent_attention_model.cross_attend_blocks.1.norm.bias": "model-00001-of-00004.safetensors",
308
+ "latent_attention_model.cross_attend_blocks.1.norm.weight": "model-00001-of-00004.safetensors",
309
+ "latent_attention_model.latents": "model-00001-of-00004.safetensors"
310
+ }
311
+ }
modeling_nvembed.py ADDED
@@ -0,0 +1,441 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ from typing import List, Union, Dict, Mapping, Optional, Tuple, TypedDict
2
+ import torch
3
+ import os
4
+ import json
5
+ import numpy as np
6
+ from functools import partial
7
+ from contextlib import nullcontext
8
+ from transformers import AutoModel, PreTrainedTokenizerFast, BatchEncoding, DataCollatorWithPadding
9
+ from transformers.modeling_utils import PreTrainedModel
10
+ from transformers.models.auto import AutoTokenizer
11
+ from transformers.models.mistral.modeling_mistral import MISTRAL_INPUTS_DOCSTRING
12
+ from transformers.modeling_outputs import BaseModelOutputWithPast, BaseModelOutputWithNoAttention
13
+ from transformers.modeling_attn_mask_utils import _prepare_4d_attention_mask, _prepare_4d_attention_mask_for_sdpa
14
+ from transformers import MistralModel, MistralConfig
15
+ from transformers.cache_utils import Cache, DynamicCache
16
+ from transformers.utils import (
17
+ add_start_docstrings_to_model_forward,
18
+ logging,
19
+ )
20
+ from einops import rearrange, repeat
21
+ from tqdm.auto import tqdm
22
+ from datasets import Dataset
23
+ from torch.utils.data import DataLoader
24
+ from .configuration_nvembed import NVEmbedConfig, LatentAttentionConfig, BidirectionalMistralConfig
25
+
26
+ logger = logging.get_logger(__name__)
27
+
28
+ class NVEmbedFeatures(TypedDict):
29
+ input_dict: torch.Tensor
30
+ attention_mask: torch.Tensor
31
+ pool_mask: torch.Tensor
32
+
33
+ class BidirectionalMistralModel(MistralModel):
34
+ config_class = BidirectionalMistralConfig
35
+
36
+ def __init__(self, config: MistralConfig):
37
+ super().__init__(config)
38
+ for layer in self.layers:
39
+ layer.self_attn.is_causal = False
40
+ self._attn_implementation = "eager"
41
+
42
+ @add_start_docstrings_to_model_forward(MISTRAL_INPUTS_DOCSTRING)
43
+ def forward(
44
+ self,
45
+ input_ids: torch.LongTensor = None,
46
+ attention_mask: Optional[torch.Tensor] = None,
47
+ position_ids: Optional[torch.LongTensor] = None,
48
+ past_key_values: Optional[List[torch.FloatTensor]] = None,
49
+ inputs_embeds: Optional[torch.FloatTensor] = None,
50
+ use_cache: Optional[bool] = None,
51
+ output_attentions: Optional[bool] = None,
52
+ output_hidden_states: Optional[bool] = None,
53
+ return_dict: Optional[bool] = None,
54
+ ) -> Union[Tuple, BaseModelOutputWithPast]:
55
+ output_attentions = output_attentions if output_attentions is not None else self.config.output_attentions
56
+ output_hidden_states = (
57
+ output_hidden_states if output_hidden_states is not None else self.config.output_hidden_states
58
+ )
59
+ use_cache = use_cache if use_cache is not None else self.config.use_cache
60
+
61
+ return_dict = return_dict if return_dict is not None else self.config.use_return_dict
62
+
63
+ # retrieve input_ids and inputs_embeds
64
+ if input_ids is not None and inputs_embeds is not None:
65
+ raise ValueError("You cannot specify both decoder_input_ids and decoder_inputs_embeds at the same time")
66
+ elif input_ids is not None:
67
+ batch_size, seq_length = input_ids.shape
68
+ elif inputs_embeds is not None:
69
+ batch_size, seq_length, _ = inputs_embeds.shape
70
+ else:
71
+ raise ValueError("You have to specify either decoder_input_ids or decoder_inputs_embeds")
72
+
73
+ if self.gradient_checkpointing and self.training:
74
+ if use_cache:
75
+ logger.warning_once(
76
+ "`use_cache=True` is incompatible with gradient checkpointing. Setting `use_cache=False`..."
77
+ )
78
+ use_cache = False
79
+
80
+ past_key_values_length = 0
81
+
82
+ if use_cache:
83
+ use_legacy_cache = not isinstance(past_key_values, Cache)
84
+ if use_legacy_cache:
85
+ past_key_values = DynamicCache.from_legacy_cache(past_key_values)
86
+ past_key_values_length = past_key_values.get_usable_length(seq_length)
87
+
88
+ if position_ids is None:
89
+ device = input_ids.device if input_ids is not None else inputs_embeds.device
90
+ position_ids = torch.arange(
91
+ past_key_values_length, seq_length + past_key_values_length, dtype=torch.long, device=device
92
+ )
93
+ position_ids = position_ids.unsqueeze(0).view(-1, seq_length)
94
+ else:
95
+ position_ids = position_ids.view(-1, seq_length).long()
96
+
97
+ if inputs_embeds is None:
98
+ inputs_embeds = self.embed_tokens(input_ids)
99
+
100
+ if attention_mask is not None and self._attn_implementation == "flash_attention_2" and use_cache:
101
+ is_padding_right = attention_mask[:, -1].sum().item() != batch_size
102
+ if is_padding_right:
103
+ raise ValueError(
104
+ "You are attempting to perform batched generation with padding_side='right'"
105
+ " this may lead to unexpected behaviour for Flash Attention version of Mistral. Make sure to "
106
+ " call `tokenizer.padding_side = 'left'` before tokenizing the input. "
107
+ )
108
+
109
+ if self._attn_implementation == "flash_attention_2":
110
+ # 2d mask is passed through the layers
111
+ attention_mask = attention_mask if (attention_mask is not None and 0 in attention_mask) else None
112
+ elif self._attn_implementation == "sdpa" and not output_attentions:
113
+ # output_attentions=True can not be supported when using SDPA, and we fall back on
114
+ # the manual implementation that requires a 4D causal mask in all cases.
115
+ attention_mask = _prepare_4d_attention_mask_for_sdpa(
116
+ attention_mask, inputs_embeds.dtype
117
+ )
118
+ else:
119
+ # 4d mask is passed through the layers
120
+ attention_mask = _prepare_4d_attention_mask(
121
+ attention_mask, inputs_embeds.dtype,
122
+ )
123
+
124
+ hidden_states = inputs_embeds
125
+
126
+ # decoder layers
127
+ all_hidden_states = () if output_hidden_states else None
128
+ all_self_attns = () if output_attentions else None
129
+ next_decoder_cache = None
130
+
131
+ for decoder_layer in self.layers:
132
+ if output_hidden_states:
133
+ all_hidden_states += (hidden_states,)
134
+
135
+ if self.gradient_checkpointing and self.training:
136
+ layer_outputs = self._gradient_checkpointing_func(
137
+ decoder_layer.__call__,
138
+ hidden_states,
139
+ attention_mask,
140
+ position_ids,
141
+ past_key_values,
142
+ output_attentions,
143
+ use_cache,
144
+ )
145
+ else:
146
+ layer_outputs = decoder_layer(
147
+ hidden_states,
148
+ attention_mask=attention_mask,
149
+ position_ids=position_ids,
150
+ past_key_value=past_key_values,
151
+ output_attentions=output_attentions,
152
+ use_cache=use_cache,
153
+ )
154
+
155
+ hidden_states = layer_outputs[0]
156
+
157
+ if use_cache:
158
+ next_decoder_cache = layer_outputs[2 if output_attentions else 1]
159
+
160
+ if output_attentions:
161
+ all_self_attns += (layer_outputs[1],)
162
+
163
+ hidden_states = self.norm(hidden_states)
164
+
165
+ # add hidden states from the last decoder layer
166
+ if output_hidden_states:
167
+ all_hidden_states += (hidden_states,)
168
+
169
+ next_cache = None
170
+ if use_cache:
171
+ next_cache = next_decoder_cache.to_legacy_cache() if use_legacy_cache else next_decoder_cache
172
+
173
+ if not return_dict:
174
+ return tuple(v for v in [hidden_states, next_cache, all_hidden_states, all_self_attns] if v is not None)
175
+ return BaseModelOutputWithPast(
176
+ last_hidden_state=hidden_states,
177
+ past_key_values=next_cache,
178
+ hidden_states=all_hidden_states,
179
+ attentions=all_self_attns,
180
+ )
181
+
182
+ def _move_to_device(maybe_tensor, device: torch.device):
183
+ if torch.is_tensor(maybe_tensor):
184
+ return maybe_tensor.to(device, non_blocking=device.type == "cuda")
185
+ elif isinstance(maybe_tensor, dict):
186
+ return {key: _move_to_device(value, device) for key, value in maybe_tensor.items()}
187
+ elif isinstance(maybe_tensor, list):
188
+ return [_move_to_device(x, device) for x in maybe_tensor]
189
+ elif isinstance(maybe_tensor, tuple):
190
+ return tuple([_move_to_device(x, device) for x in maybe_tensor])
191
+ elif isinstance(maybe_tensor, Mapping):
192
+ return type(maybe_tensor)({k: _move_to_device(v, device) for k, v in maybe_tensor.items()})
193
+ else:
194
+ return maybe_tensor
195
+
196
+ def move_to_device(sample, device: torch.device):
197
+ if device.type == "cpu":
198
+ return sample
199
+
200
+ if len(sample) == 0:
201
+ return {}
202
+ return _move_to_device(sample, device)
203
+
204
+
205
+ def input_transform_func(
206
+ tokenizer: PreTrainedTokenizerFast,
207
+ examples: Dict[str, List],
208
+ always_add_eos: bool,
209
+ max_length: int,
210
+ instruction: str,
211
+ ) -> BatchEncoding:
212
+ if always_add_eos:
213
+ examples['input_texts'] = [instruction + input_example + tokenizer.eos_token for input_example in examples['input_texts']]
214
+ batch_dict = tokenizer(
215
+ examples['input_texts'],
216
+ max_length=max_length,
217
+ padding=True,
218
+ return_token_type_ids=False,
219
+ return_tensors="pt",
220
+ truncation=True)
221
+ return batch_dict
222
+
223
+
224
+ class PreNorm(torch.nn.Module):
225
+ def __init__(self, dim, fn, context_dim = None):
226
+ super().__init__()
227
+ self.fn = fn
228
+ self.norm = torch.nn.LayerNorm(dim)
229
+ self.norm_context = torch.nn.LayerNorm(context_dim) if exists(context_dim) else None
230
+
231
+ def forward(self, x, **kwargs):
232
+ x = self.norm(x)
233
+ if exists(self.norm_context):
234
+ context = kwargs['context']
235
+ normed_context = self.norm_context(context)
236
+ kwargs.update(context = normed_context)
237
+ return self.fn(x, **kwargs)
238
+
239
+ class GEGLU(torch.nn.Module):
240
+ def forward(self, x):
241
+ x, gates = x.chunk(2, dim = -1)
242
+ return x * torch.nn.functional.gelu(gates)
243
+
244
+ class FeedForward(torch.nn.Module):
245
+ def __init__(self, dim, mult = 4):
246
+ super().__init__()
247
+ self.net = torch.nn.Sequential(torch.nn.Linear(dim, dim * mult * 2),
248
+ GEGLU(),
249
+ torch.nn.Linear(dim * mult, dim))
250
+
251
+ def forward(self, x):
252
+ return self.net(x)
253
+
254
+ def exists(val):
255
+ return val is not None
256
+
257
+ def default(val, d):
258
+ return val if exists(val) else d
259
+
260
+
261
+ class Attention(torch.nn.Module):
262
+ def __init__(self, query_dim, context_dim = None, heads = 8, dim_head = 64):
263
+ super().__init__()
264
+ inner_dim = dim_head * heads
265
+ context_dim = default(context_dim, query_dim)
266
+ self.scale = dim_head ** -0.5
267
+ self.heads = heads
268
+
269
+ self.to_q = torch.nn.Linear(query_dim, inner_dim, bias = False)
270
+ self.to_kv = torch.nn.Linear(context_dim, inner_dim * 2, bias = False)
271
+ self.to_out = torch.nn.Linear(inner_dim, query_dim, bias = False)
272
+
273
+ def forward(self, x, context = None, mask = None):
274
+ h = self.heads
275
+ q = self.to_q(x)
276
+ context = default(context, x)
277
+ k, v = self.to_kv(context).chunk(2, dim = -1)
278
+ q, k, v = map(lambda t: rearrange(t, 'b n (h d) -> (b h) n d', h = h), (q, k, v))
279
+ with torch.backends.cuda.sdp_kernel(enable_flash=True, enable_mem_efficient=True):
280
+ out = torch.nn.functional.scaled_dot_product_attention(q, k, v)
281
+ out = rearrange(out, '(b h) n d -> b n (h d)', h = h)
282
+ return self.to_out(out)
283
+
284
+
285
+ class LatentAttentionModel(PreTrainedModel):
286
+ config_class = LatentAttentionConfig
287
+
288
+ def __init__(self, config: LatentAttentionConfig):
289
+ super().__init__(config)
290
+ ## cross-attention block
291
+ num_latents, latent_dim, cross_heads, cross_dim_head = config.num_latents_value, config.latent_dim, config.num_cross_heads, config.cross_dim_head
292
+ dim = config.hidden_dim
293
+ # init latent_attention and latents
294
+ self.cross_attend_blocks = torch.nn.ModuleList([
295
+ PreNorm(latent_dim, Attention(latent_dim, dim, heads = cross_heads, dim_head = cross_dim_head),
296
+ context_dim = dim),
297
+ PreNorm(latent_dim, FeedForward(latent_dim)),
298
+ ])
299
+ self.output_normalize = config.output_normalize
300
+ self.register_parameter("latents", torch.nn.Parameter(torch.randn(num_latents, latent_dim)))
301
+
302
+ def forward(self, hiddens, attention_mask: torch.Tensor=None):
303
+ ## cross-attention block
304
+ cross_attn, cross_ff = self.cross_attend_blocks
305
+ b, *_, device = *hiddens.shape, hiddens.device
306
+ x = repeat(self.latents, 'n d -> b n d', b = b)
307
+ hiddens = cross_attn(hiddens, context = x, mask = None) + hiddens
308
+ hiddens = cross_ff(hiddens) + hiddens
309
+ if attention_mask !=None:
310
+ s = torch.sum(hiddens * attention_mask.unsqueeze(-1).float(), dim=1)
311
+ d = attention_mask.sum(dim=1, keepdim=True).float()
312
+ hiddens = s / d
313
+ if self.output_normalize:
314
+ hiddens = torch.nn.functional.normalize(hiddens, p=2, dim=-1)
315
+ return hiddens
316
+
317
+ class NVEmbedModel(PreTrainedModel):
318
+ config_class = NVEmbedConfig
319
+ _no_split_modules = ["MistralDecoderLayer", "LatentAttentionModel"]
320
+
321
+ def __init__(self, config: NVEmbedConfig):
322
+ super().__init__(config)
323
+ self.latent_attention_model = AutoModel.from_config(config.latent_attention_config)
324
+ self.embedding_model = AutoModel.from_config(
325
+ config.text_config,
326
+ ) if config.text_config is not None else None
327
+ self.tokenizer = AutoTokenizer.from_pretrained(config.text_config._name_or_path) if config.text_config is not None else None
328
+ self.padding_side = config.padding_side
329
+ self.is_mask_instruction = config.is_mask_instruction
330
+ self.add_eos = config.add_eos
331
+ self.mask_type = config.mask_type
332
+ if config.add_pad_token and self.tokenizer is not None:
333
+ self.add_pad_token()
334
+
335
+ def add_pad_token(self):
336
+ self.tokenizer.pad_token = self.tokenizer.eos_token
337
+ self.tokenizer.padding_side = self.padding_side
338
+
339
+ def prepare_kwargs_from_batch(self, batch_dict: dict, instruction_lens: int, device: torch.device):
340
+ batch_dict = move_to_device(batch_dict, device)
341
+ attention_mask = batch_dict['attention_mask'].clone() if 'attention_mask' in batch_dict else None
342
+ if (attention_mask is not None and
343
+ self.padding_side == "right" and
344
+ self.is_mask_instruction == True and
345
+ instruction_lens > 0):
346
+ # Mask out the instruction tokens for mean-pooling
347
+ attention_mask[:, :instruction_lens] = 0
348
+ features: NVEmbedFeatures = {
349
+ 'input_ids': torch.tensor(batch_dict.get('input_ids').to(batch_dict.get('input_ids')).long()),
350
+ 'attention_mask': batch_dict['attention_mask'],
351
+ 'pool_mask': attention_mask,
352
+ }
353
+ return features
354
+
355
+ @torch.no_grad()
356
+ def _do_encode(self,
357
+ prompts: List[str],
358
+ batch_size: int=1,
359
+ instruction: str="",
360
+ max_length: int=4096,
361
+ num_workers: int=32,
362
+ **kwargs
363
+ ) -> Union[np.ndarray, torch.FloatTensor]:
364
+ dataset: Dataset = Dataset.from_dict({'input_texts': prompts})
365
+ dataset.set_transform(partial(input_transform_func,
366
+ self.tokenizer,
367
+ always_add_eos=True,
368
+ max_length=max_length,
369
+ instruction=instruction))
370
+
371
+ data_collator = DataCollatorWithPadding(self.tokenizer)
372
+ data_loader = DataLoader(
373
+ dataset,
374
+ batch_size=batch_size,
375
+ shuffle=False,
376
+ drop_last=False,
377
+ num_workers=num_workers,
378
+ collate_fn=data_collator,
379
+ pin_memory=True)
380
+
381
+ if self.padding_side == "right" and self.is_mask_instruction == True and len(instruction) > 0:
382
+ instruction_lens = len(self.tokenizer.tokenize(instruction))
383
+ else:
384
+ instruction_lens = 0
385
+
386
+ encoded_embeds = []
387
+ device = next(self.embedding_model.parameters()).device
388
+ for batch_dict in tqdm(data_loader, desc='encoding', mininterval=10):
389
+ features = self.prepare_kwargs_from_batch(batch_dict, instruction_lens, device=device)
390
+ embeds=self(**features)["sentence_embeddings"].squeeze(1)
391
+ encoded_embeds.append(embeds)
392
+ encoded_embeds = torch.cat(encoded_embeds, axis=0)
393
+ if "return_numpy" in kwargs and kwargs.get("return_numpy"):
394
+ encoded_embeds = encoded_embeds.cpu().detach().numpy()
395
+ return encoded_embeds
396
+
397
+ def forward(self, input_ids: torch.Tensor, attention_mask: torch.Tensor, pool_mask: Optional[torch.Tensor]=None, return_dict: bool=True):
398
+ autocast_ctx = torch.autocast if torch.cuda.is_available() else nullcontext
399
+ with autocast_ctx("cuda"):
400
+ ## decoder only layer
401
+ outputs = self.embedding_model(
402
+ input_ids=input_ids,
403
+ attention_mask=attention_mask,
404
+ )
405
+ ## latent attention layer
406
+ embeds = self.latent_attention_model(
407
+ outputs.last_hidden_state,
408
+ pool_mask,
409
+ )
410
+ if not return_dict:
411
+ return (embeds,)
412
+ return BaseModelOutputWithNoAttention(last_hidden_state=embeds)
413
+
414
+
415
+ @torch.no_grad()
416
+ def encode(self, prompts: List[str], instruction: str="", max_length: int=4096, **kwargs):
417
+ if self.padding_side == "right" and self.is_mask_instruction == True and len(instruction) > 0:
418
+ instruction_lens = len(self.tokenizer.tokenize(instruction))
419
+ else:
420
+ instruction_lens = 0
421
+
422
+ device = next(self.embedding_model.parameters()).device
423
+ batch_dict = input_transform_func(self.tokenizer,
424
+ {"input_texts": [prompt for prompt in prompts]},
425
+ always_add_eos=True,
426
+ max_length=max_length,
427
+ instruction=instruction)
428
+
429
+ features: NVEmbedFeatures = self.prepare_kwargs_from_batch(batch_dict, instruction_lens, device=device)
430
+ return self(**features)["sentence_embeddings"].squeeze(1)
431
+
432
+
433
+ ## AutoModel Register
434
+ AutoModel.register(NVEmbedConfig, NVEmbedModel)
435
+ AutoModel.register(LatentAttentionConfig, LatentAttentionModel)
436
+ AutoModel.register(BidirectionalMistralConfig, BidirectionalMistralModel)
437
+
438
+ ## Register for auto class
439
+ NVEmbedModel.register_for_auto_class("AutoModel")
440
+ LatentAttentionModel.register_for_auto_class("AutoModel")
441
+ BidirectionalMistralModel.register_for_auto_class("AutoModel")
modules.json ADDED
@@ -0,0 +1,26 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ [
2
+ {
3
+ "idx": 0,
4
+ "name": "0",
5
+ "path": "",
6
+ "type": "sentence_transformers.models.Transformer"
7
+ },
8
+ {
9
+ "idx": 1,
10
+ "name": "1",
11
+ "path": "1_Pooling",
12
+ "type": "sentence_transformers.models.Pooling"
13
+ },
14
+ {
15
+ "idx": 2,
16
+ "name": "2",
17
+ "path": "2_Normalize",
18
+ "type": "sentence_transformers.models.Normalize"
19
+ },
20
+ {
21
+ "idx": 3,
22
+ "name": "3",
23
+ "path": "3_CSRSparsity",
24
+ "type": "sentence_transformers.sparse_encoder.models.CSRSparsity"
25
+ }
26
+ ]
sentence_bert_config.json ADDED
@@ -0,0 +1,4 @@
 
 
 
 
 
1
+ {
2
+ "max_seq_length": 32768,
3
+ "do_lower_case": false
4
+ }
special_tokens_map.json ADDED
@@ -0,0 +1,30 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "bos_token": {
3
+ "content": "<s>",
4
+ "lstrip": false,
5
+ "normalized": false,
6
+ "rstrip": false,
7
+ "single_word": false
8
+ },
9
+ "eos_token": {
10
+ "content": "</s>",
11
+ "lstrip": false,
12
+ "normalized": false,
13
+ "rstrip": false,
14
+ "single_word": false
15
+ },
16
+ "pad_token": {
17
+ "content": "</s>",
18
+ "lstrip": false,
19
+ "normalized": false,
20
+ "rstrip": false,
21
+ "single_word": false
22
+ },
23
+ "unk_token": {
24
+ "content": "<unk>",
25
+ "lstrip": false,
26
+ "normalized": false,
27
+ "rstrip": false,
28
+ "single_word": false
29
+ }
30
+ }
tokenizer.json ADDED
The diff for this file is too large to render. See raw diff
 
tokenizer.model ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:dadfd56d766715c61d2ef780a525ab43b8e6da4de6865bda3d95fdef5e134055
3
+ size 493443
tokenizer_config.json ADDED
@@ -0,0 +1,43 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "add_bos_token": true,
3
+ "add_eos_token": false,
4
+ "add_prefix_space": null,
5
+ "added_tokens_decoder": {
6
+ "0": {
7
+ "content": "<unk>",
8
+ "lstrip": false,
9
+ "normalized": false,
10
+ "rstrip": false,
11
+ "single_word": false,
12
+ "special": true
13
+ },
14
+ "1": {
15
+ "content": "<s>",
16
+ "lstrip": false,
17
+ "normalized": false,
18
+ "rstrip": false,
19
+ "single_word": false,
20
+ "special": true
21
+ },
22
+ "2": {
23
+ "content": "</s>",
24
+ "lstrip": false,
25
+ "normalized": false,
26
+ "rstrip": false,
27
+ "single_word": false,
28
+ "special": true
29
+ }
30
+ },
31
+ "additional_special_tokens": [],
32
+ "bos_token": "<s>",
33
+ "clean_up_tokenization_spaces": false,
34
+ "eos_token": "</s>",
35
+ "legacy": true,
36
+ "model_max_length": 1000000000000000019884624838656,
37
+ "pad_token": "</s>",
38
+ "sp_model_kwargs": {},
39
+ "spaces_between_special_tokens": false,
40
+ "tokenizer_class": "LlamaTokenizer",
41
+ "unk_token": "<unk>",
42
+ "use_default_system_prompt": false
43
+ }