--- license: llama2 base_model: meta-llama/CodeLlama-34b-Instruct-hf tags: - alignment-handbook - generated_from_trainer datasets: - meng-lab/CodeLlama-34B-Instruct-humaneval model-index: - name: CodeLlama-34b-Instruct-sft-5e-3-epoch-100-human-eval-final results: [] --- [Visualize in Weights & Biases](https://wandb.ai/uva-llm/huggingface/runs/93nixyw1) # CodeLlama-34b-Instruct-sft-5e-3-epoch-100-human-eval-final This model is a fine-tuned version of [meta-llama/CodeLlama-34b-Instruct-hf](https://huggingface.co/meta-llama/CodeLlama-34b-Instruct-hf) on the meng-lab/CodeLlama-34B-Instruct-humaneval dataset. It achieves the following results on the evaluation set: - Loss: 3.7616 - Loss Layer 6 Head: 1.0709 - Loss Layer 12 Head: 0.8047 - Loss Layer 18 Head: 0.7212 - Loss Layer 24 Head: 0.4396 - Loss Layer 30 Head: 0.3042 - Loss Layer 36 Head: 0.2040 - Loss Layer 42 Head: 0.1346 ## Model description More information needed ## Intended uses & limitations More information needed ## Training and evaluation data More information needed ## Training procedure ### Training hyperparameters The following hyperparameters were used during training: - learning_rate: 0.005 - train_batch_size: 1 - eval_batch_size: 2 - seed: 42 - distributed_type: multi-GPU - num_devices: 8 - gradient_accumulation_steps: 16 - total_train_batch_size: 128 - total_eval_batch_size: 16 - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08 - lr_scheduler_type: cosine - lr_scheduler_warmup_ratio: 0.1 - num_epochs: 100 ### Training results | Training Loss | Epoch | Step | Validation Loss | Loss Layer 6 Head | Loss Layer 12 Head | Loss Layer 18 Head | Loss Layer 24 Head | Loss Layer 30 Head | Loss Layer 36 Head | Loss Layer 42 Head | |:-------------:|:-------:|:----:|:---------------:|:-----------------:|:------------------:|:------------------:|:------------------:|:------------------:|:------------------:|:------------------:| | 3.7028 | 9.1168 | 200 | 4.6352 | 1.4035 | 0.8311 | 1.0429 | 0.4931 | 0.4457 | 0.2349 | 0.1599 | | 2.736 | 18.2336 | 400 | 4.7219 | 1.2158 | 0.8316 | 0.7490 | 1.0869 | 0.3238 | 0.2666 | 0.1723 | | 2.0128 | 27.3504 | 600 | 3.8953 | 1.1598 | 0.8030 | 0.7230 | 0.4451 | 0.3500 | 0.2027 | 0.1459 | | 3.3605 | 36.4672 | 800 | 4.9203 | 1.1038 | 0.8175 | 1.6655 | 0.4410 | 0.3091 | 0.2055 | 0.1365 | | 2.5177 | 45.5840 | 1000 | 4.2388 | 1.0907 | 0.8042 | 1.1115 | 0.4403 | 0.3038 | 0.2217 | 0.1412 | | 2.0743 | 54.7009 | 1200 | 3.9221 | 1.0727 | 0.8050 | 0.8689 | 0.4418 | 0.3012 | 0.2044 | 0.1362 | | 1.8844 | 63.8177 | 1400 | 3.8140 | 1.0723 | 0.8028 | 0.7729 | 0.4389 | 0.3045 | 0.2036 | 0.1350 | | 1.8019 | 72.9345 | 1600 | 3.7777 | 1.0726 | 0.8038 | 0.7376 | 0.4401 | 0.3042 | 0.2032 | 0.1345 | | 1.7339 | 82.0513 | 1800 | 3.7662 | 1.0703 | 0.8056 | 0.7246 | 0.4394 | 0.3041 | 0.2042 | 0.1347 | | 1.6981 | 91.1681 | 2000 | 3.7616 | 1.0709 | 0.8047 | 0.7212 | 0.4396 | 0.3042 | 0.2040 | 0.1346 | ### Framework versions - Transformers 4.43.2 - Pytorch 2.5.1+cu124 - Datasets 3.2.0 - Tokenizers 0.19.1