qnguyen3 commited on
Commit
360a073
·
verified ·
1 Parent(s): 242d220

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +2 -115
README.md CHANGED
@@ -14,120 +14,7 @@ model-index:
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
- [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
18
- <details><summary>See axolotl config</summary>
19
 
20
- axolotl version: `0.4.1`
21
- ```yaml
22
- base_model: Qwen/Qwen2.5-3B
23
- load_in_8bit: false
24
- load_in_4bit: false
25
- strict: false
26
-
27
- datasets:
28
- - path: arcee-ai/eval_tome
29
- type: sharegpt
30
- conversation: chatml
31
- - path: arcee-ai/math_code_5k_claude
32
- type: sharegpt
33
- conversation: chatml
34
- split: validation
35
- - path: Undi95/Capybara-ShareGPT
36
- type: sharegpt
37
- conversation: chatml
38
- dataset_prepared_path:
39
- val_set_size: 0.0
40
-
41
- sequence_len: 8192
42
- sample_packing: true
43
-
44
- lora_fan_in_fan_out:
45
- wandb_project: qwen2.5-3b-gelato
46
- wandb_entity:
47
- wandb_watch:
48
- wandb_name:
49
- wandb_log_model:
50
- output_dir: ./outputs/gelato-3b
51
- gradient_accumulation_steps: 8
52
- micro_batch_size: 2
53
- num_epochs: 4
54
- optimizer: adamw_bnb_8bit
55
- torchdistx_path:
56
- lr_scheduler: cosine
57
- learning_rate: 0.0002
58
- train_on_inputs: false
59
- group_by_length: false
60
- bf16: auto
61
- fp16:
62
- tf32: true
63
- gradient_checkpointing: true
64
- early_stopping_patience:
65
- resume_from_checkpoint:
66
- local_rank:
67
- logging_steps: 1
68
- xformers_attention:
69
- flash_attention: true
70
- gptq_groupsize:
71
- s2_attention:
72
- gptq_model_v1:
73
- warmup_steps: 50
74
- evals_per_epoch:
75
- saves_per_epoch: 1
76
- debug:
77
- deepspeed: deepspeed_configs/zero3_bf16_cpuoffload_params.json
78
- weight_decay: 0.1
79
- fsdp:
80
- fsdp_config:
81
- special_tokens:
82
- eos_token: "<|im_end|>"
83
- bos_token: "<|im_start|>"
84
-
85
- ```
86
-
87
- </details><br>
88
-
89
- # outputs/gelato-3b
90
-
91
- This model is a fine-tuned version of [Qwen/Qwen2.5-3B](https://huggingface.co/Qwen/Qwen2.5-3B) on the None dataset.
92
-
93
- ## Model description
94
-
95
- More information needed
96
-
97
- ## Intended uses & limitations
98
-
99
- More information needed
100
-
101
- ## Training and evaluation data
102
-
103
- More information needed
104
-
105
- ## Training procedure
106
-
107
- ### Training hyperparameters
108
-
109
- The following hyperparameters were used during training:
110
- - learning_rate: 0.0002
111
- - train_batch_size: 2
112
- - eval_batch_size: 2
113
- - seed: 42
114
- - distributed_type: multi-GPU
115
- - num_devices: 4
116
- - gradient_accumulation_steps: 8
117
- - total_train_batch_size: 64
118
- - total_eval_batch_size: 8
119
- - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
120
- - lr_scheduler_type: cosine
121
- - lr_scheduler_warmup_steps: 50
122
- - num_epochs: 4
123
-
124
- ### Training results
125
-
126
-
127
-
128
- ### Framework versions
129
 
130
- - Transformers 4.45.1
131
- - Pytorch 2.3.1+cu121
132
- - Datasets 2.21.0
133
- - Tokenizers 0.20.0
 
14
  <!-- This model card has been generated automatically according to the information the Trainer had access to. You
15
  should probably proofread and complete it, then remove this comment. -->
16
 
17
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630430583926de1f7ec62c6b/L45Szb9WeV-K_bxS8aFoj.png)
 
18
 
19
+ ![image/png](https://cdn-uploads.huggingface.co/production/uploads/630430583926de1f7ec62c6b/GQtNdAaoXZXwf4noU883B.png)
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
20