Files changed (1) hide show
  1. README.md +72 -59
README.md CHANGED
@@ -1,59 +1,72 @@
1
- ---
2
- base_model:
3
- - Qwen/Qwen2.5-7B-Instruct
4
- - Qwen/Qwen2.5-7B
5
- - Qwen/Qwen2.5-Math-7B
6
- library_name: transformers
7
- tags:
8
- - mergekit
9
- - merge
10
-
11
- ---
12
- # Qwen2.5-7B-Instruct-Math-dare-linear
13
-
14
- This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
15
-
16
- ## Performance
17
- | Metric |Value|
18
- |---------------------------------|----:|
19
- |GSM8k (zero-shot) |90.75|
20
- |HellaSwag (zero-Shot) |80.77|
21
- |MBPP (zero-shot) |63.08|
22
-
23
- ## Merge Details
24
- ### Merge Method
25
-
26
- This model was merged using the [Linear DARE](https://arxiv.org/abs/2311.03099) merge method using [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) as a base.
27
-
28
- ### Models Merged
29
-
30
- The following models were included in the merge:
31
- * [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
32
- * [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
33
-
34
- ### Configuration
35
-
36
- The following YAML configuration was used to produce this model:
37
-
38
- ```yaml
39
- base_model: Qwen/Qwen2.5-7B
40
- dtype: bfloat16
41
- merge_method: dare_linear
42
- parameters:
43
- lambda: 0.7484721287441042
44
- normalize: 1.0
45
- slices:
46
- - sources:
47
- - layer_range: [0, 28]
48
- model: Qwen/Qwen2.5-7B
49
- - layer_range: [0, 28]
50
- model: Qwen/Qwen2.5-Math-7B
51
- parameters:
52
- density: 0.8456557088847347
53
- weight: 0.11064925820848412
54
- - layer_range: [0, 28]
55
- model: Qwen/Qwen2.5-7B-Instruct
56
- parameters:
57
- density: 0.5247829319933462
58
- weight: 0.6901952279079901
59
- ```
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ base_model:
3
+ - Qwen/Qwen2.5-7B-Instruct
4
+ - Qwen/Qwen2.5-7B
5
+ - Qwen/Qwen2.5-Math-7B
6
+ library_name: transformers
7
+ tags:
8
+ - mergekit
9
+ - merge
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ ---
25
+ # Qwen2.5-7B-Instruct-Math-dare-linear
26
+
27
+ This is a merge of pre-trained language models created using [mergekit](https://github.com/cg123/mergekit).
28
+
29
+ ## Performance
30
+ | Metric |Value|
31
+ |---------------------------------|----:|
32
+ |GSM8k (zero-shot) |90.75|
33
+ |HellaSwag (zero-Shot) |80.77|
34
+ |MBPP (zero-shot) |63.08|
35
+
36
+ ## Merge Details
37
+ ### Merge Method
38
+
39
+ This model was merged using the [Linear DARE](https://arxiv.org/abs/2311.03099) merge method using [Qwen/Qwen2.5-7B](https://huggingface.co/Qwen/Qwen2.5-7B) as a base.
40
+
41
+ ### Models Merged
42
+
43
+ The following models were included in the merge:
44
+ * [Qwen/Qwen2.5-7B-Instruct](https://huggingface.co/Qwen/Qwen2.5-7B-Instruct)
45
+ * [Qwen/Qwen2.5-Math-7B](https://huggingface.co/Qwen/Qwen2.5-Math-7B)
46
+
47
+ ### Configuration
48
+
49
+ The following YAML configuration was used to produce this model:
50
+
51
+ ```yaml
52
+ base_model: Qwen/Qwen2.5-7B
53
+ dtype: bfloat16
54
+ merge_method: dare_linear
55
+ parameters:
56
+ lambda: 0.7484721287441042
57
+ normalize: 1.0
58
+ slices:
59
+ - sources:
60
+ - layer_range: [0, 28]
61
+ model: Qwen/Qwen2.5-7B
62
+ - layer_range: [0, 28]
63
+ model: Qwen/Qwen2.5-Math-7B
64
+ parameters:
65
+ density: 0.8456557088847347
66
+ weight: 0.11064925820848412
67
+ - layer_range: [0, 28]
68
+ model: Qwen/Qwen2.5-7B-Instruct
69
+ parameters:
70
+ density: 0.5247829319933462
71
+ weight: 0.6901952279079901
72
+ ```