minpeter commited on
Commit
a0bf091
·
verified ·
1 Parent(s): a4e463b

End of training

Browse files
Files changed (3) hide show
  1. README.md +116 -0
  2. generation_config.json +9 -0
  3. training_args.bin +1 -1
README.md ADDED
@@ -0,0 +1,116 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ library_name: transformers
3
+ license: llama3.2
4
+ base_model: meta-llama/Llama-3.2-1B
5
+ tags:
6
+ - axolotl
7
+ - generated_from_trainer
8
+ datasets:
9
+ - alpaca_data.json
10
+ model-index:
11
+ - name: Alpaca-Llama-3.2-3B-Instruct
12
+ results: []
13
+ ---
14
+
15
+ <!-- This model card has been generated automatically according to the information the Trainer had access to. You
16
+ should probably proofread and complete it, then remove this comment. -->
17
+
18
+ [<img src="https://raw.githubusercontent.com/axolotl-ai-cloud/axolotl/main/image/axolotl-badge-web.png" alt="Built with Axolotl" width="200" height="32"/>](https://github.com/axolotl-ai-cloud/axolotl)
19
+ <details><summary>See axolotl config</summary>
20
+
21
+ axolotl version: `0.6.0`
22
+ ```yaml
23
+ base_model: meta-llama/Llama-3.2-1B
24
+ model_type: LlamaForCausalLM
25
+ tokenizer_type: PreTrainedTokenizerFast
26
+
27
+ # load_in_4bit: true
28
+ # load_in_8bit: false
29
+ strict: false
30
+
31
+ save_safetensors: true
32
+ flash_attention: true
33
+
34
+ auto_resume_from_checkpoints: true
35
+ save_steps: 100
36
+
37
+ learning_rate: 5e-4
38
+ num_epochs: 3
39
+ micro_batch_size: 8
40
+ gradient_accumulation_steps: 4
41
+
42
+ hub_model_id: minpeter/Alpaca-Llama-3.2-3B-Instruct
43
+
44
+ dataset_processes: 5000
45
+
46
+ # chat_template: jinja
47
+ chat_template_jinja: |
48
+ {%- for message in messages %}
49
+ {{- '<|' + message['role'] + '|>\n' }}
50
+ {{- message['content'] + eos_token }}
51
+ {%- endfor %}
52
+ {%- if add_generation_prompt %}
53
+ {{- '<|assistant|>\n' }}
54
+ {%- endif %}
55
+
56
+ datasets:
57
+ - path: alpaca_data.json
58
+ type: alpaca
59
+
60
+ special_tokens:
61
+ pad_token: <pad>
62
+
63
+ optimizer: adamw_torch_fused
64
+ lr_scheduler: cosine
65
+
66
+ wandb_project: "axolotl"
67
+ wandb_entity: "kasfiekfs-e"
68
+ ```
69
+
70
+ </details><br>
71
+
72
+ # Alpaca-Llama-3.2-3B-Instruct
73
+
74
+ This model is a fine-tuned version of [meta-llama/Llama-3.2-1B](https://huggingface.co/meta-llama/Llama-3.2-1B) on the alpaca_data.json dataset.
75
+
76
+ ## Model description
77
+
78
+ More information needed
79
+
80
+ ## Intended uses & limitations
81
+
82
+ More information needed
83
+
84
+ ## Training and evaluation data
85
+
86
+ More information needed
87
+
88
+ ## Training procedure
89
+
90
+ ### Training hyperparameters
91
+
92
+ The following hyperparameters were used during training:
93
+ - learning_rate: 0.0005
94
+ - train_batch_size: 8
95
+ - eval_batch_size: 8
96
+ - seed: 42
97
+ - distributed_type: multi-GPU
98
+ - num_devices: 8
99
+ - gradient_accumulation_steps: 4
100
+ - total_train_batch_size: 256
101
+ - total_eval_batch_size: 64
102
+ - optimizer: Use OptimizerNames.ADAMW_TORCH_FUSED with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
103
+ - lr_scheduler_type: cosine
104
+ - lr_scheduler_warmup_steps: 18
105
+ - num_epochs: 3
106
+
107
+ ### Training results
108
+
109
+
110
+
111
+ ### Framework versions
112
+
113
+ - Transformers 4.47.1
114
+ - Pytorch 2.5.1+cu124
115
+ - Datasets 3.2.0
116
+ - Tokenizers 0.21.0
generation_config.json ADDED
@@ -0,0 +1,9 @@
 
 
 
 
 
 
 
 
 
 
1
+ {
2
+ "_from_model_config": true,
3
+ "bos_token_id": 128000,
4
+ "do_sample": true,
5
+ "eos_token_id": 128001,
6
+ "temperature": 0.6,
7
+ "top_p": 0.9,
8
+ "transformers_version": "4.47.1"
9
+ }
training_args.bin CHANGED
@@ -1,3 +1,3 @@
1
  version https://git-lfs.github.com/spec/v1
2
- oid sha256:1a7a40f1948d638b4c3d94115bced6662d2a636f1dd423a8f665b27475a5848e
3
  size 6520
 
1
  version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f98044e6c330a616299f3995650e40b472f09a8381c8ef5107cd3ce35c26893a
3
  size 6520