End of training
Browse files
README.md
CHANGED
@@ -114,7 +114,7 @@ wandb_log_model:
|
|
114 |
|
115 |
gradient_accumulation_steps: 2
|
116 |
micro_batch_size: 2
|
117 |
-
num_epochs:
|
118 |
optimizer: adamw_8bit
|
119 |
lr_scheduler: cosine
|
120 |
learning_rate: 0.0002
|
@@ -154,7 +154,7 @@ special_tokens:
|
|
154 |
|
155 |
This model is a fine-tuned version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) on the minpeter/xlam-function-calling-60k-hermes, the minpeter/xlam-irrelevance-7.5k-qwen2.5-72b-distill-hermes, the minpeter/hermes-function-calling-v1-jsonl and the minpeter/hermes-function-calling-v1-jsonl datasets.
|
156 |
It achieves the following results on the evaluation set:
|
157 |
-
- Loss: 0.
|
158 |
|
159 |
## Model description
|
160 |
|
@@ -182,16 +182,19 @@ The following hyperparameters were used during training:
|
|
182 |
- optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
183 |
- lr_scheduler_type: cosine
|
184 |
- lr_scheduler_warmup_steps: 10
|
185 |
-
- num_epochs:
|
186 |
|
187 |
### Training results
|
188 |
|
189 |
| Training Loss | Epoch | Step | Validation Loss |
|
190 |
|:-------------:|:------:|:----:|:---------------:|
|
191 |
| 0.5354 | 0.0039 | 1 | 0.7727 |
|
192 |
-
| 0.
|
193 |
-
| 0.
|
194 |
-
| 0.
|
|
|
|
|
|
|
195 |
|
196 |
|
197 |
### Framework versions
|
|
|
114 |
|
115 |
gradient_accumulation_steps: 2
|
116 |
micro_batch_size: 2
|
117 |
+
num_epochs: 2
|
118 |
optimizer: adamw_8bit
|
119 |
lr_scheduler: cosine
|
120 |
learning_rate: 0.0002
|
|
|
154 |
|
155 |
This model is a fine-tuned version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) on the minpeter/xlam-function-calling-60k-hermes, the minpeter/xlam-irrelevance-7.5k-qwen2.5-72b-distill-hermes, the minpeter/hermes-function-calling-v1-jsonl and the minpeter/hermes-function-calling-v1-jsonl datasets.
|
156 |
It achieves the following results on the evaluation set:
|
157 |
+
- Loss: 0.3335
|
158 |
|
159 |
## Model description
|
160 |
|
|
|
182 |
- optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
|
183 |
- lr_scheduler_type: cosine
|
184 |
- lr_scheduler_warmup_steps: 10
|
185 |
+
- num_epochs: 2.0
|
186 |
|
187 |
### Training results
|
188 |
|
189 |
| Training Loss | Epoch | Step | Validation Loss |
|
190 |
|:-------------:|:------:|:----:|:---------------:|
|
191 |
| 0.5354 | 0.0039 | 1 | 0.7727 |
|
192 |
+
| 0.4667 | 0.3327 | 85 | 0.3745 |
|
193 |
+
| 0.1858 | 0.6654 | 170 | 0.3515 |
|
194 |
+
| 0.5982 | 0.9980 | 255 | 0.3440 |
|
195 |
+
| 0.1452 | 1.3288 | 340 | 0.3389 |
|
196 |
+
| 0.2287 | 1.6614 | 425 | 0.3344 |
|
197 |
+
| 0.1441 | 1.9941 | 510 | 0.3335 |
|
198 |
|
199 |
|
200 |
### Framework versions
|