YOYO-AI commited on
Commit
eb2fd71
·
verified ·
1 Parent(s): 981d788

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +257 -1
README.md CHANGED
@@ -22,4 +22,260 @@ tags:
22
  # ZYH-LLM-Qwen2.5-14B-V4
23
  *The fourth-generation model of ZYH-LLM-Qwen2.5 has been released!*
24
 
25
- *Increase the proportion of the **R1 distillation model** in the model merging recipe while maintaining the model's **instruction-following ability** and **general capabilities.***
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
22
  # ZYH-LLM-Qwen2.5-14B-V4
23
  *The fourth-generation model of ZYH-LLM-Qwen2.5 has been released!*
24
 
25
+ *Increase the proportion of the **R1 distillation model** in the model merging recipe while maintaining the model's **instruction-following ability** and **general capabilities.***
26
+
27
+ ## Merge Template
28
+
29
+ ```yaml
30
+ merge_method: model_stock
31
+ base_model: Instruction Model
32
+ models:
33
+ - model: Instruction Fine-tuning Model 1
34
+ - model: Instruction Fine-tuning Model 2
35
+ - model: Inference Fine-tuning Model 1
36
+ - model: Inference Fine-tuning Model 2
37
+ dtype: bfloat16
38
+ tokenizer_source: base
39
+ int8_mask: true
40
+ normalize: true
41
+ ```
42
+ Using the above template for merging can improve the **calculation accuracy** and **inference ability** of the model without reducing the **general capabilities** of the instruction model.
43
+
44
+ **ZYH-LLM-Qwen2.5-V4** used this template during the model merging process.
45
+
46
+ ## First stage:
47
+ *Create four different instruction models and code model*
48
+ ```yaml
49
+ models:
50
+ - model: Qwen/Qwen2.5-14B-Instruct
51
+ parameters:
52
+ density: 1
53
+ weight: 1
54
+ lambda: 0.9
55
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
56
+ parameters:
57
+ density: 1
58
+ weight: 1
59
+ lambda: 0.9
60
+ merge_method: della
61
+ base_model: Qwen/Qwen2.5-14B
62
+ parameters:
63
+ density: 1
64
+ weight: 1
65
+ lambda: 0.9
66
+ normalize: true
67
+ int8_mask: true
68
+ dtype: bfloat16
69
+ tokenizer_source: base
70
+ name: Qwen2.5-14B-della-base
71
+ ```
72
+ ```yaml
73
+ models:
74
+ - model: Qwen/Qwen2.5-14B-Instruct
75
+ parameters:
76
+ density: 1
77
+ weight: 1
78
+ lambda: 0.9
79
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
80
+ parameters:
81
+ density: 1
82
+ weight: 1
83
+ lambda: 0.9
84
+ merge_method: della
85
+ base_model: arcee-ai/Virtuoso-Small-v2
86
+ parameters:
87
+ density: 1
88
+ weight: 1
89
+ lambda: 0.9
90
+ normalize: true
91
+ int8_mask: true
92
+ dtype: bfloat16
93
+ tokenizer_source: base
94
+ name: Qwen2.5-14B-della-v2
95
+ ```
96
+ ```yaml
97
+ models:
98
+ - model: Qwen/Qwen2.5-14B-Instruct
99
+ parameters:
100
+ density: 1
101
+ weight: 1
102
+ lambda: 0.9
103
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
104
+ parameters:
105
+ density: 1
106
+ weight: 1
107
+ lambda: 0.9
108
+ merge_method: della
109
+ base_model: arcee-ai/SuperNova-Medius
110
+ parameters:
111
+ density: 1
112
+ weight: 1
113
+ lambda: 0.9
114
+ normalize: true
115
+ int8_mask: true
116
+ dtype: bfloat16
117
+ tokenizer_source: base
118
+ name: Qwen2.5-14B-della-Nova
119
+ ```
120
+ ```yaml
121
+ models:
122
+ - model: Qwen/Qwen2.5-14B-Instruct
123
+ parameters:
124
+ density: 1
125
+ weight: 1
126
+ lambda: 0.9
127
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
128
+ parameters:
129
+ density: 1
130
+ weight: 1
131
+ lambda: 0.9
132
+ merge_method: della
133
+ base_model: Azure99/Blossom-V6-14B
134
+ parameters:
135
+ density: 1
136
+ weight: 1
137
+ lambda: 0.9
138
+ normalize: true
139
+ int8_mask: true
140
+ dtype: bfloat16
141
+ tokenizer_source: base
142
+ name: Qwen2.5-14B-della-V6
143
+ ```
144
+ ```yaml
145
+ models:
146
+ - model: Qwen/Qwen2.5-Coder-14B-Instruct
147
+ parameters:
148
+ density: 1
149
+ weight: 1
150
+ lambda: 0.9
151
+ merge_method: della
152
+ base_model: Qwen/Qwen2.5-Coder-14B
153
+ parameters:
154
+ density: 1
155
+ weight: 1
156
+ lambda: 0.9
157
+ normalize: true
158
+ int8_mask: true
159
+ dtype: bfloat16
160
+ tokenizer_source: base
161
+ name: Qwen2.5-Coder-14B-della
162
+ ```
163
+ ## Second stage:
164
+
165
+ ### Step 1:
166
+ *Create three instruction models with a bias towards reasoning by using templates.*
167
+ ```yaml
168
+ merge_method: model_stock
169
+ base_model: Qwen2.5-14B-della-base
170
+ models:
171
+ - model: Qwen2.5-Coder-14B-della
172
+ - model: Qwen2.5-14B-della-v2
173
+ - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
174
+ - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
175
+ dtype: bfloat16
176
+ tokenizer_source: base
177
+ int8_mask: true
178
+ normalize: true
179
+ name: Qwen2.5-14B-mst-Coder
180
+ ```
181
+ ```yaml
182
+ merge_method: model_stock
183
+ base_model: Qwen2.5-14B-della-base
184
+ models:
185
+ - model: Qwen2.5-14B-della-V6
186
+ - model: Qwen2.5-14B-della-v2
187
+ - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
188
+ - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
189
+ dtype: bfloat16
190
+ tokenizer_source: base
191
+ int8_mask: true
192
+ normalize: true
193
+ name: Qwen2.5-14B-mst-V6
194
+ ```
195
+ ```yaml
196
+ merge_method: model_stock
197
+ base_model: Qwen2.5-14B-della-base
198
+ models:
199
+ - model: Qwen2.5-14B-della-Nova
200
+ - model: Qwen2.5-14B-della-v2
201
+ - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
202
+ - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
203
+ dtype: bfloat16
204
+ tokenizer_source: base
205
+ int8_mask: true
206
+ normalize: true
207
+ name: Qwen2.5-14B-mst-Nova
208
+ ```
209
+ ### Step 2:
210
+ *Create a pure instruction model to restore the generality of the final model.*
211
+ ```yaml
212
+ merge_method: model_stock
213
+ base_model: Qwen2.5-14B-della-base
214
+ models:
215
+ - model: Qwen2.5-14B-della-Nova
216
+ - model: Qwen2.5-14B-della-v2
217
+ - model: Qwen2.5-14B-della-V6
218
+ dtype: bfloat16
219
+ tokenizer_source: base
220
+ int8_mask: true
221
+ normalize: true
222
+ name: Qwen2.5-14B-mst-it
223
+ ```
224
+ ## Third stage:
225
+ *Create a base model with a context of 1 million tokens.*
226
+ ```yaml
227
+ merge_method: sce
228
+ models:
229
+ # Pivot model
230
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
231
+ # Target models
232
+ - model: Qwen/Qwen2.5-14B
233
+ base_model: Qwen/Qwen2.5-14B-Instruct-1M
234
+ parameters:
235
+ select_topk: 1
236
+ dtype: bfloat16
237
+ tokenizer_source: base
238
+ normalize: true
239
+ int8_mask: true
240
+ name: Qwen2.5-14B-1M
241
+ ```
242
+ ```yaml
243
+ models:
244
+ - model: Qwen/Qwen2.5-14B-Instruct
245
+ parameters:
246
+ density: 1
247
+ weight: 1
248
+ lambda: 0.9
249
+ - model: Qwen/Qwen2.5-14B-Instruct-1M
250
+ parameters:
251
+ density: 1
252
+ weight: 1
253
+ lambda: 0.9
254
+ merge_method: della
255
+ base_model: Qwen2.5-14B-1M
256
+ parameters:
257
+ density: 1
258
+ weight: 1
259
+ lambda: 0.9
260
+ normalize: true
261
+ int8_mask: true
262
+ dtype: bfloat16
263
+ tokenizer_source: base
264
+ name: Qwen2.5-14B-della-1M
265
+ ```
266
+ ## Final stage:
267
+
268
+ ```yaml
269
+ merge_method: model_stock
270
+ base_model: Qwen2.5-14B-della-1M
271
+ models:
272
+ - model: Qwen2.5-14B-mst-Coder
273
+ - model: Qwen2.5-14B-mst-V6
274
+ - model: Qwen2.5-14B-mst-Nova
275
+ - model: Qwen2.5-14B-mst-it
276
+ dtype: bfloat16
277
+ tokenizer_source: base
278
+ int8_mask: true
279
+ normalize: true
280
+ name: ZYH-LLM-Qwen2.5-14B-V4
281
+ ```