zephyr-7b-mypo3_sim-full-beta10.0-lr4e-7

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-full on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 1.3702
  • Rewards/chosen: 0.0877
  • Rewards/rejected: -0.3857
  • Rewards/accuracies: 0.7480
  • Rewards/margins: 0.4734
  • Logps/rejected: -1.1662
  • Logps/chosen: -0.9702
  • Logits/rejected: -2.5842
  • Logits/chosen: -2.6366

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 4e-07
  • train_batch_size: 4
  • eval_batch_size: 8
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 4
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 64
  • total_eval_batch_size: 32
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
1.3832 0.1047 100 1.3867 0.1162 -0.1491 0.7361 0.2653 -1.1426 -0.9673 -2.5828 -2.6375
1.4139 0.2094 200 1.4360 -0.0655 -0.4534 0.7063 0.3879 -1.1730 -0.9855 -2.6262 -2.6797
1.3957 0.3141 300 1.3946 0.0975 -0.3093 0.7202 0.4069 -1.1586 -0.9692 -2.6662 -2.7142
1.376 0.4187 400 1.3812 0.1541 -0.2826 0.7242 0.4366 -1.1559 -0.9635 -2.5936 -2.6453
1.367 0.5234 500 1.3809 0.1133 -0.3635 0.7361 0.4767 -1.1640 -0.9676 -2.5923 -2.6449
1.3654 0.6281 600 1.3839 0.1611 -0.2773 0.7460 0.4384 -1.1554 -0.9628 -2.6239 -2.6742
1.3722 0.7328 700 1.3789 0.1067 -0.3804 0.7381 0.4870 -1.1657 -0.9683 -2.5906 -2.6428
1.3544 0.8375 800 1.3720 0.0906 -0.3825 0.7480 0.4731 -1.1659 -0.9699 -2.5825 -2.6349
1.3566 0.9422 900 1.3701 0.0867 -0.3862 0.7480 0.4729 -1.1663 -0.9703 -2.5839 -2.6363

Framework versions

  • Transformers 4.43.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
2
Safetensors
Model size
7.24B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kimory-X/zephyr-7b-mypo3_sim-full-beta10.0-lr4e-7

Finetuned
(388)
this model

Dataset used to train Kimory-X/zephyr-7b-mypo3_sim-full-beta10.0-lr4e-7