YOYO-AI
/

ZYH-LLM-Qwen2.5-14B-V4

@@ -22,4 +22,260 @@ tags:
 # ZYH-LLM-Qwen2.5-14B-V4
 *The fourth-generation model of ZYH-LLM-Qwen2.5 has been released!*
-*Increase the proportion of the **R1 distillation model** in the model merging recipe while maintaining the model's **instruction-following ability** and **general capabilities.***

 # ZYH-LLM-Qwen2.5-14B-V4
 *The fourth-generation model of ZYH-LLM-Qwen2.5 has been released!*
+*Increase the proportion of the **R1 distillation model** in the model merging recipe while maintaining the model's **instruction-following ability** and **general capabilities.***
+## Merge Template
+```yaml
+merge_method: model_stock
+base_model: Instruction Model
+models:
+  - model: Instruction Fine-tuning Model 1
+  - model: Instruction Fine-tuning Model 2
+  - model: Inference Fine-tuning Model 1
+  - model: Inference Fine-tuning Model 2
+dtype: bfloat16
+tokenizer_source: base
+int8_mask: true
+normalize: true
+```
+Using the above template for merging can improve the **calculation accuracy** and **inference ability** of the model without reducing the **general capabilities** of the instruction model.
+**ZYH-LLM-Qwen2.5-V4** used this template during the model merging process.
+## First stage:
+*Create four different instruction models and code model*
+```yaml
+models:
+  - model: Qwen/Qwen2.5-14B-Instruct
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+  - model: Qwen/Qwen2.5-14B-Instruct-1M
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+merge_method: della
+base_model: Qwen/Qwen2.5-14B
+parameters:
+  density: 1
+  weight: 1
+  lambda: 0.9
+  normalize: true
+  int8_mask: true
+dtype: bfloat16
+tokenizer_source: base
+name: Qwen2.5-14B-della-base
+```
+```yaml
+models:
+  - model: Qwen/Qwen2.5-14B-Instruct
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+  - model: Qwen/Qwen2.5-14B-Instruct-1M
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+merge_method: della
+base_model: arcee-ai/Virtuoso-Small-v2
+parameters:
+  density: 1
+  weight: 1
+  lambda: 0.9
+  normalize: true
+  int8_mask: true
+dtype: bfloat16
+tokenizer_source: base
+name: Qwen2.5-14B-della-v2
+```
+```yaml
+models:
+  - model: Qwen/Qwen2.5-14B-Instruct
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+  - model: Qwen/Qwen2.5-14B-Instruct-1M
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+merge_method: della
+base_model: arcee-ai/SuperNova-Medius
+parameters:
+  density: 1
+  weight: 1
+  lambda: 0.9
+  normalize: true
+  int8_mask: true
+dtype: bfloat16
+tokenizer_source: base
+name: Qwen2.5-14B-della-Nova
+```
+```yaml
+models:
+  - model: Qwen/Qwen2.5-14B-Instruct
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+  - model: Qwen/Qwen2.5-14B-Instruct-1M
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+merge_method: della
+base_model: Azure99/Blossom-V6-14B
+parameters:
+  density: 1
+  weight: 1
+  lambda: 0.9
+  normalize: true
+  int8_mask: true
+dtype: bfloat16
+tokenizer_source: base
+name: Qwen2.5-14B-della-V6
+```
+```yaml
+models:
+  - model: Qwen/Qwen2.5-Coder-14B-Instruct
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+merge_method: della
+base_model: Qwen/Qwen2.5-Coder-14B
+parameters:
+  density: 1
+  weight: 1
+  lambda: 0.9
+  normalize: true
+  int8_mask: true
+dtype: bfloat16
+tokenizer_source: base
+name: Qwen2.5-Coder-14B-della
+```
+## Second stage:
+### Step 1:
+*Create three instruction models with a bias towards reasoning by using templates.*
+```yaml
+merge_method: model_stock
+base_model: Qwen2.5-14B-della-base
+models:
+  - model: Qwen2.5-Coder-14B-della
+  - model: Qwen2.5-14B-della-v2
+  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
+  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
+dtype: bfloat16
+tokenizer_source: base
+int8_mask: true
+normalize: true
+name: Qwen2.5-14B-mst-Coder
+```
+```yaml
+merge_method: model_stock
+base_model: Qwen2.5-14B-della-base
+models:
+  - model: Qwen2.5-14B-della-V6
+  - model: Qwen2.5-14B-della-v2
+  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
+  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
+dtype: bfloat16
+tokenizer_source: base
+int8_mask: true
+normalize: true
+name: Qwen2.5-14B-mst-V6
+```
+```yaml
+merge_method: model_stock
+base_model: Qwen2.5-14B-della-base
+models:
+  - model: Qwen2.5-14B-della-Nova
+  - model: Qwen2.5-14B-della-v2
+  - model: deepseek-ai/DeepSeek-R1-Distill-Qwen-14B
+  - model: huihui-ai/DeepSeek-R1-Distill-Qwen-14B-abliterated-v2
+dtype: bfloat16
+tokenizer_source: base
+int8_mask: true
+normalize: true
+name: Qwen2.5-14B-mst-Nova
+```
+### Step 2:
+*Create a pure instruction model to restore the generality of the final model.*
+```yaml
+merge_method: model_stock
+base_model: Qwen2.5-14B-della-base
+models:
+  - model: Qwen2.5-14B-della-Nova
+  - model: Qwen2.5-14B-della-v2
+  - model: Qwen2.5-14B-della-V6
+dtype: bfloat16
+tokenizer_source: base
+int8_mask: true
+normalize: true
+name: Qwen2.5-14B-mst-it
+```
+## Third stage:
+*Create a base model with a context of 1 million tokens.*
+```yaml
+merge_method: sce
+models:
+  # Pivot model
+  - model: Qwen/Qwen2.5-14B-Instruct-1M
+  # Target models
+  - model: Qwen/Qwen2.5-14B
+base_model: Qwen/Qwen2.5-14B-Instruct-1M
+parameters:
+  select_topk: 1
+dtype: bfloat16
+tokenizer_source: base
+normalize: true
+int8_mask: true
+name: Qwen2.5-14B-1M
+```
+```yaml
+models:
+  - model: Qwen/Qwen2.5-14B-Instruct
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+  - model: Qwen/Qwen2.5-14B-Instruct-1M
+    parameters:
+      density: 1
+      weight: 1
+      lambda: 0.9
+merge_method: della
+base_model: Qwen2.5-14B-1M
+parameters:
+  density: 1
+  weight: 1
+  lambda: 0.9
+  normalize: true
+  int8_mask: true
+dtype: bfloat16
+tokenizer_source: base
+name: Qwen2.5-14B-della-1M
+```
+## Final stage:
+```yaml
+merge_method: model_stock
+base_model: Qwen2.5-14B-della-1M
+models:
+  - model: Qwen2.5-14B-mst-Coder
+  - model: Qwen2.5-14B-mst-V6
+  - model: Qwen2.5-14B-mst-Nova
+  - model: Qwen2.5-14B-mst-it
+dtype: bfloat16
+tokenizer_source: base
+int8_mask: true
+normalize: true
+name: ZYH-LLM-Qwen2.5-14B-V4
+```