lbourdois commited on
Commit
b02c74c
·
verified ·
1 Parent(s): c31e394

Improve language tag

Browse files

Hi! As the model is multilingual, this is a PR to add other languages than English to the language tag to improve the referencing. Note that 29 languages are announced in the README, but only 13 are explicitly listed. I was therefore only able to add these 13 languages.

Files changed (1) hide show
  1. README.md +209 -197
README.md CHANGED
@@ -1,197 +1,209 @@
1
- ---
2
- library_name: transformers
3
- tags:
4
- - unsloth
5
- - trl
6
- - grpo
7
- license: mit
8
- datasets:
9
- - eagle0504/augmented_codealpaca-20k-using-together-ai-deepseek-v1
10
- language:
11
- - en
12
- base_model:
13
- - Qwen/Qwen2.5-1.5B-Instruct
14
- ---
15
-
16
- # Qwen2.5-1.5B-Instruct Fine-Tuned on CodeAlpaca-20K with DeepSeek Augmentation
17
-
18
- ## Model Overview
19
-
20
- This model is a fine-tuned version of **Qwen2.5-1.5B-Instruct**, designed for **instruction-following and structured reasoning**. It is trained on an **enhanced CodeAlpaca-20K dataset**, incorporating **Chain-of-Thought (CoT) reasoning** augmented by **DeepSeek AI**.
21
-
22
- ### Key Features
23
- - **Base Model:** Qwen2.5-1.5B-Instruct
24
- - **Fine-Tuned On:** CodeAlpaca-20K enhanced with DeepSeek-V3
25
- - **Optimized for:** Instruction-following, structured reasoning, and problem-solving
26
- - **Fine-tuning method:** LoRA (Low-Rank Adaptation)
27
- - **Inference-ready:** Available on **Hugging Face** and compatible with `llama.cpp`
28
- - **Supports GGUF:** Optimized versions for **Q4_K_M, Q8_0, Q5_K_M, and FP16**
29
-
30
- ## Model Details
31
-
32
- - **Developed by:** [Yiqiao Yin](https://www.y-yin.io/)
33
- - **Model Type:** Causal Language Model (Text Generation)
34
- - **Languages:** English (`en`)
35
- - **License:** MIT License
36
- - **Fine-tuned from:** `Qwen/Qwen2.5-1.5B-Instruct`
37
- - **Training Library:** `transformers` + `unsloth` + `trl`
38
- - **Quantization:** GGUF (`Q4_K_M, Q8_0, Q5_K_M, f16`)
39
-
40
- 🔗 **Hugging Face Repository:**
41
- 👉 [Fine-tuned Qwen2.5-1.5B-Instruct](https://huggingface.co/eagle0504/qwen-2_5-1_5b-instruct-using-codealpaca-20k-enhanced-v1)
42
-
43
- ## How to Use the Model
44
-
45
- ### Using `transformers` in Python
46
- You may need to install `bitsandbytes` by using
47
-
48
- ```bash
49
- ! pip install -U bitsandbytes
50
- ```
51
-
52
- Then you can use the following code to run inference.
53
- ```python
54
- from transformers import AutoModelForCausalLM, AutoTokenizer
55
- import torch
56
-
57
- # Load model and tokenizer
58
- model_name = "eagle0504/qwen-2_5-1_5b-instruct-using-codealpaca-20k-enhanced-v1"
59
- tokenizer = AutoTokenizer.from_pretrained(model_name)
60
- model = AutoModelForCausalLM.from_pretrained(model_name)
61
-
62
- # Move model to GPU if available
63
- device = "cuda" if torch.cuda.is_available() else "cpu"
64
- model.to(device)
65
-
66
- # Example inference
67
- question = "How do I implement a binary search algorithm in Python?"
68
- inputs = tokenizer(question, return_tensors="pt").to(device)
69
- output = model.generate(**inputs, max_length=200)
70
-
71
- # Decode response
72
- print(tokenizer.decode(output[0], skip_special_tokens=True))
73
- ```
74
-
75
- ## Running the Model with `llama.cpp`
76
-
77
- ### Step 1: Install `llama.cpp`
78
- ```sh
79
- brew install llama.cpp
80
- ```
81
-
82
- ### Step 2: Download the Model
83
- ```sh
84
- mkdir -p ~/llama_models && cd ~/llama_models
85
- wget https://huggingface.co/eagle0504/qwen-2_5-1_5b-instruct-using-codealpaca-20k-enhanced-v1/resolve/main/q8_0.gguf
86
- ```
87
-
88
- ### Step 3: Run the Model
89
- ```sh
90
- llama-cli -m ~/llama_models/q8_0.gguf --interactive
91
- ```
92
-
93
- Or you can use the following:
94
-
95
- ```sh
96
- llama-cli -hf eagle0504/qwen-2_5-1_5b-instruct-using-codealpaca-20k-enhanced-v1:Q8_0
97
- ```
98
-
99
- ### Step 4: Test with a Prompt
100
- ```sh
101
- llama-cli -m ~/llama_models/q8_0.gguf -p "Explain the differences between breadth-first search and depth-first search."
102
- ```
103
-
104
- ## Training Details
105
-
106
- ### Custom Reward
107
-
108
- ```python
109
- def count_xml(text: str) -> float:
110
- """
111
- Calculates a reward based on the occurrence of certain XML tags and subtracts penalties for content after closing tags.
112
-
113
- Args:
114
- text (str): The text string to analyze for XML tag consistency.
115
-
116
- Returns:
117
- float: Total reward score based on XML tag occurrence and penalties.
118
- """
119
- count = 0.0
120
- if text.count("<think>\n") == 1:
121
- count += 0.125
122
- if text.count("\n</think>\n") == 1:
123
- count += 0.125
124
- if text.count("\n<answer>\n") == 1:
125
- count += 0.125
126
- count -= len(text.split("\n</answer>\n")[-1])*0.001
127
- if text.count("\n</answer>") == 1:
128
- count += 0.125
129
- count -= (len(text.split("\n</answer>")[-1]) - 1)*0.001
130
-
131
- # Ensure `<think>` and `</think>` exist
132
- if "<think>" in text and "</think>" in text:
133
- count += 1.0 # Higher weight to ensure reasoning consistency
134
- else:
135
- count -= 1.0 # Penalize if missing
136
-
137
- return count
138
- ```
139
-
140
- Each component contributes to the total reward **if conditions are met**:
141
-
142
- | Condition | Reward |
143
- |-----------|--------|
144
- | `"<think>\n"` appears exactly **once** | **+0.125** |
145
- | `"\n</think>\n"` appears exactly **once** | **+0.125** |
146
- | `"\n<answer>\n"` appears exactly **once** | **+0.125** |
147
- | `"\n</answer>"` appears exactly **once** | **+0.125** |
148
- | Both `<think>` and `</think>` exist anywhere | **+1.0** |
149
- | No extra text after `"</answer>"` | **No penalty** |
150
-
151
- Total possible reward **before penalties**:
152
- \[
153
- 0.125 + 0.125 + 0.125 + 0.125 + 1.0 = 1.5
154
- \]
155
-
156
- **Potential Penalties**
157
- The function applies penalties for **extra content after `"</answer>"`**:
158
- \[
159
- -\left( \text{length of extra text} \times 0.001 \right)
160
- \]
161
- If the **best case** occurs (i.e., **no extra content**), then:
162
- - **Penalty = 0**
163
- - **Final Reward = 1.5 (no deductions)**
164
-
165
- ---
166
-
167
- **Best Possible Input Example**
168
- This **ideal input** gives the highest possible reward:
169
-
170
- ```xml
171
- <think>
172
- Valid reasoning goes here.
173
- </think>
174
-
175
- <answer>
176
- Correct final answer here.
177
- </answer>
178
- ```
179
-
180
- This means we customize the reward function so that we encourage the answer to have reasoning inside. We also know mathematically what the reward should be so we can monitor it during training process.
181
-
182
- ### Dataset Used
183
- The model was fine-tuned on:
184
- 🔹 [`eagle0504/augmented_codealpaca-20k-using-together-ai-deepseek-v1`](https://huggingface.co/datasets/eagle0504/augmented_codealpaca-20k-using-together-ai-deepseek-v1)
185
-
186
- This dataset contains:
187
- - **20K augmented training samples**
188
- - Features: `instruction`, `response`, `cot` (Chain-of-Thought)
189
-
190
- ### Training Configuration
191
- - **Framework:** `transformers` + `unsloth` + `trl`
192
- - **Optimization:** LoRA applied to QKV projections
193
- - **Learning Rate:** `1e-6`
194
- - **AdamW Optimizer (8-bit)**
195
- - **Mixed Precision (`bf16` or `fp16`)**
196
- - **Batch Size:** `8`
197
- - **Max Sequence Length:** `1024`
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ tags:
4
+ - unsloth
5
+ - trl
6
+ - grpo
7
+ license: mit
8
+ datasets:
9
+ - eagle0504/augmented_codealpaca-20k-using-together-ai-deepseek-v1
10
+ language:
11
+ - zho
12
+ - eng
13
+ - fra
14
+ - spa
15
+ - por
16
+ - deu
17
+ - ita
18
+ - rus
19
+ - jpn
20
+ - kor
21
+ - vie
22
+ - tha
23
+ - ara
24
+ base_model:
25
+ - Qwen/Qwen2.5-1.5B-Instruct
26
+ ---
27
+
28
+ # Qwen2.5-1.5B-Instruct Fine-Tuned on CodeAlpaca-20K with DeepSeek Augmentation
29
+
30
+ ## Model Overview
31
+
32
+ This model is a fine-tuned version of **Qwen2.5-1.5B-Instruct**, designed for **instruction-following and structured reasoning**. It is trained on an **enhanced CodeAlpaca-20K dataset**, incorporating **Chain-of-Thought (CoT) reasoning** augmented by **DeepSeek AI**.
33
+
34
+ ### Key Features
35
+ - **Base Model:** Qwen2.5-1.5B-Instruct
36
+ - **Fine-Tuned On:** CodeAlpaca-20K enhanced with DeepSeek-V3
37
+ - **Optimized for:** Instruction-following, structured reasoning, and problem-solving
38
+ - **Fine-tuning method:** LoRA (Low-Rank Adaptation)
39
+ - **Inference-ready:** Available on **Hugging Face** and compatible with `llama.cpp`
40
+ - **Supports GGUF:** Optimized versions for **Q4_K_M, Q8_0, Q5_K_M, and FP16**
41
+
42
+ ## Model Details
43
+
44
+ - **Developed by:** [Yiqiao Yin](https://www.y-yin.io/)
45
+ - **Model Type:** Causal Language Model (Text Generation)
46
+ - **Languages:** English (`en`)
47
+ - **License:** MIT License
48
+ - **Fine-tuned from:** `Qwen/Qwen2.5-1.5B-Instruct`
49
+ - **Training Library:** `transformers` + `unsloth` + `trl`
50
+ - **Quantization:** GGUF (`Q4_K_M, Q8_0, Q5_K_M, f16`)
51
+
52
+ 🔗 **Hugging Face Repository:**
53
+ 👉 [Fine-tuned Qwen2.5-1.5B-Instruct](https://huggingface.co/eagle0504/qwen-2_5-1_5b-instruct-using-codealpaca-20k-enhanced-v1)
54
+
55
+ ## How to Use the Model
56
+
57
+ ### Using `transformers` in Python
58
+ You may need to install `bitsandbytes` by using
59
+
60
+ ```bash
61
+ ! pip install -U bitsandbytes
62
+ ```
63
+
64
+ Then you can use the following code to run inference.
65
+ ```python
66
+ from transformers import AutoModelForCausalLM, AutoTokenizer
67
+ import torch
68
+
69
+ # Load model and tokenizer
70
+ model_name = "eagle0504/qwen-2_5-1_5b-instruct-using-codealpaca-20k-enhanced-v1"
71
+ tokenizer = AutoTokenizer.from_pretrained(model_name)
72
+ model = AutoModelForCausalLM.from_pretrained(model_name)
73
+
74
+ # Move model to GPU if available
75
+ device = "cuda" if torch.cuda.is_available() else "cpu"
76
+ model.to(device)
77
+
78
+ # Example inference
79
+ question = "How do I implement a binary search algorithm in Python?"
80
+ inputs = tokenizer(question, return_tensors="pt").to(device)
81
+ output = model.generate(**inputs, max_length=200)
82
+
83
+ # Decode response
84
+ print(tokenizer.decode(output[0], skip_special_tokens=True))
85
+ ```
86
+
87
+ ## Running the Model with `llama.cpp`
88
+
89
+ ### Step 1: Install `llama.cpp`
90
+ ```sh
91
+ brew install llama.cpp
92
+ ```
93
+
94
+ ### Step 2: Download the Model
95
+ ```sh
96
+ mkdir -p ~/llama_models && cd ~/llama_models
97
+ wget https://huggingface.co/eagle0504/qwen-2_5-1_5b-instruct-using-codealpaca-20k-enhanced-v1/resolve/main/q8_0.gguf
98
+ ```
99
+
100
+ ### Step 3: Run the Model
101
+ ```sh
102
+ llama-cli -m ~/llama_models/q8_0.gguf --interactive
103
+ ```
104
+
105
+ Or you can use the following:
106
+
107
+ ```sh
108
+ llama-cli -hf eagle0504/qwen-2_5-1_5b-instruct-using-codealpaca-20k-enhanced-v1:Q8_0
109
+ ```
110
+
111
+ ### Step 4: Test with a Prompt
112
+ ```sh
113
+ llama-cli -m ~/llama_models/q8_0.gguf -p "Explain the differences between breadth-first search and depth-first search."
114
+ ```
115
+
116
+ ## Training Details
117
+
118
+ ### Custom Reward
119
+
120
+ ```python
121
+ def count_xml(text: str) -> float:
122
+ """
123
+ Calculates a reward based on the occurrence of certain XML tags and subtracts penalties for content after closing tags.
124
+
125
+ Args:
126
+ text (str): The text string to analyze for XML tag consistency.
127
+
128
+ Returns:
129
+ float: Total reward score based on XML tag occurrence and penalties.
130
+ """
131
+ count = 0.0
132
+ if text.count("<think>\n") == 1:
133
+ count += 0.125
134
+ if text.count("\n</think>\n") == 1:
135
+ count += 0.125
136
+ if text.count("\n<answer>\n") == 1:
137
+ count += 0.125
138
+ count -= len(text.split("\n</answer>\n")[-1])*0.001
139
+ if text.count("\n</answer>") == 1:
140
+ count += 0.125
141
+ count -= (len(text.split("\n</answer>")[-1]) - 1)*0.001
142
+
143
+ # Ensure `<think>` and `</think>` exist
144
+ if "<think>" in text and "</think>" in text:
145
+ count += 1.0 # Higher weight to ensure reasoning consistency
146
+ else:
147
+ count -= 1.0 # Penalize if missing
148
+
149
+ return count
150
+ ```
151
+
152
+ Each component contributes to the total reward **if conditions are met**:
153
+
154
+ | Condition | Reward |
155
+ |-----------|--------|
156
+ | `"<think>\n"` appears exactly **once** | **+0.125** |
157
+ | `"\n</think>\n"` appears exactly **once** | **+0.125** |
158
+ | `"\n<answer>\n"` appears exactly **once** | **+0.125** |
159
+ | `"\n</answer>"` appears exactly **once** | **+0.125** |
160
+ | Both `<think>` and `</think>` exist anywhere | **+1.0** |
161
+ | No extra text after `"</answer>"` | **No penalty** |
162
+
163
+ Total possible reward **before penalties**:
164
+ \[
165
+ 0.125 + 0.125 + 0.125 + 0.125 + 1.0 = 1.5
166
+ \]
167
+
168
+ **Potential Penalties**
169
+ The function applies penalties for **extra content after `"</answer>"`**:
170
+ \[
171
+ -\left( \text{length of extra text} \times 0.001 \right)
172
+ \]
173
+ If the **best case** occurs (i.e., **no extra content**), then:
174
+ - **Penalty = 0**
175
+ - **Final Reward = 1.5 (no deductions)**
176
+
177
+ ---
178
+
179
+ **Best Possible Input Example**
180
+ This **ideal input** gives the highest possible reward:
181
+
182
+ ```xml
183
+ <think>
184
+ Valid reasoning goes here.
185
+ </think>
186
+
187
+ <answer>
188
+ Correct final answer here.
189
+ </answer>
190
+ ```
191
+
192
+ This means we customize the reward function so that we encourage the answer to have reasoning inside. We also know mathematically what the reward should be so we can monitor it during training process.
193
+
194
+ ### Dataset Used
195
+ The model was fine-tuned on:
196
+ 🔹 [`eagle0504/augmented_codealpaca-20k-using-together-ai-deepseek-v1`](https://huggingface.co/datasets/eagle0504/augmented_codealpaca-20k-using-together-ai-deepseek-v1)
197
+
198
+ This dataset contains:
199
+ - **20K augmented training samples**
200
+ - Features: `instruction`, `response`, `cot` (Chain-of-Thought)
201
+
202
+ ### Training Configuration
203
+ - **Framework:** `transformers` + `unsloth` + `trl`
204
+ - **Optimization:** LoRA applied to QKV projections
205
+ - **Learning Rate:** `1e-6`
206
+ - **AdamW Optimizer (8-bit)**
207
+ - **Mixed Precision (`bf16` or `fp16`)**
208
+ - **Batch Size:** `8`
209
+ - **Max Sequence Length:** `1024`