minpeter commited on
Commit
7da9ce6
·
verified ·
1 Parent(s): bd4b67d

End of training

Browse files
Files changed (1) hide show
  1. README.md +9 -6
README.md CHANGED
@@ -114,7 +114,7 @@ wandb_log_model:
114
 
115
  gradient_accumulation_steps: 2
116
  micro_batch_size: 2
117
- num_epochs: 1
118
  optimizer: adamw_8bit
119
  lr_scheduler: cosine
120
  learning_rate: 0.0002
@@ -154,7 +154,7 @@ special_tokens:
154
 
155
  This model is a fine-tuned version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) on the minpeter/xlam-function-calling-60k-hermes, the minpeter/xlam-irrelevance-7.5k-qwen2.5-72b-distill-hermes, the minpeter/hermes-function-calling-v1-jsonl and the minpeter/hermes-function-calling-v1-jsonl datasets.
156
  It achieves the following results on the evaluation set:
157
- - Loss: 0.3492
158
 
159
  ## Model description
160
 
@@ -182,16 +182,19 @@ The following hyperparameters were used during training:
182
  - optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
183
  - lr_scheduler_type: cosine
184
  - lr_scheduler_warmup_steps: 10
185
- - num_epochs: 1.0
186
 
187
  ### Training results
188
 
189
  | Training Loss | Epoch | Step | Validation Loss |
190
  |:-------------:|:------:|:----:|:---------------:|
191
  | 0.5354 | 0.0039 | 1 | 0.7727 |
192
- | 0.4681 | 0.3327 | 85 | 0.3746 |
193
- | 0.1909 | 0.6654 | 170 | 0.3528 |
194
- | 0.6071 | 0.9980 | 255 | 0.3492 |
 
 
 
195
 
196
 
197
  ### Framework versions
 
114
 
115
  gradient_accumulation_steps: 2
116
  micro_batch_size: 2
117
+ num_epochs: 2
118
  optimizer: adamw_8bit
119
  lr_scheduler: cosine
120
  learning_rate: 0.0002
 
154
 
155
  This model is a fine-tuned version of [meta-llama/Llama-3.2-3B-Instruct](https://huggingface.co/meta-llama/Llama-3.2-3B-Instruct) on the minpeter/xlam-function-calling-60k-hermes, the minpeter/xlam-irrelevance-7.5k-qwen2.5-72b-distill-hermes, the minpeter/hermes-function-calling-v1-jsonl and the minpeter/hermes-function-calling-v1-jsonl datasets.
156
  It achieves the following results on the evaluation set:
157
+ - Loss: 0.3335
158
 
159
  ## Model description
160
 
 
182
  - optimizer: Use OptimizerNames.ADAMW_8BIT with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
183
  - lr_scheduler_type: cosine
184
  - lr_scheduler_warmup_steps: 10
185
+ - num_epochs: 2.0
186
 
187
  ### Training results
188
 
189
  | Training Loss | Epoch | Step | Validation Loss |
190
  |:-------------:|:------:|:----:|:---------------:|
191
  | 0.5354 | 0.0039 | 1 | 0.7727 |
192
+ | 0.4667 | 0.3327 | 85 | 0.3745 |
193
+ | 0.1858 | 0.6654 | 170 | 0.3515 |
194
+ | 0.5982 | 0.9980 | 255 | 0.3440 |
195
+ | 0.1452 | 1.3288 | 340 | 0.3389 |
196
+ | 0.2287 | 1.6614 | 425 | 0.3344 |
197
+ | 0.1441 | 1.9941 | 510 | 0.3335 |
198
 
199
 
200
  ### Framework versions