Visualize in Weights & Biases

zephyr-7b-ipo-qlora-lr5e6-beta0.1

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

  • Loss: 17.7650
  • Rewards/chosen: -0.4722
  • Rewards/rejected: -0.6065
  • Rewards/accuracies: 0.7325
  • Rewards/margins: 0.1344
  • Logps/rejected: -7.1232
  • Logps/chosen: -5.6472
  • Logits/rejected: -0.8746
  • Logits/chosen: -0.8434

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-06
  • train_batch_size: 2
  • eval_batch_size: 4
  • seed: 42
  • distributed_type: multi-GPU
  • num_devices: 5
  • gradient_accumulation_steps: 4
  • total_train_batch_size: 40
  • total_eval_batch_size: 20
  • optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
  • lr_scheduler_type: cosine
  • lr_scheduler_warmup_ratio: 0.1
  • num_epochs: 1

Training results

Training Loss Epoch Step Validation Loss Rewards/chosen Rewards/rejected Rewards/accuracies Rewards/margins Logps/rejected Logps/chosen Logits/rejected Logits/chosen
24.1839 0.0654 100 24.1345 -0.0190 -0.0307 0.6325 0.0117 -1.3653 -1.1158 -2.1858 -2.2703
22.9327 0.1308 200 23.1023 -0.0531 -0.0855 0.6800 0.0324 -1.9128 -1.4563 -2.1297 -2.2081
19.7663 0.1963 300 20.8720 -0.2046 -0.2845 0.7125 0.0800 -3.9032 -2.9713 -1.7473 -1.7810
19.9497 0.2617 400 19.7683 -0.3156 -0.4116 0.6975 0.0960 -5.1735 -4.0811 -1.5106 -1.5203
19.9461 0.3271 500 19.0518 -0.3438 -0.4530 0.7200 0.1092 -5.5879 -4.3632 -1.1895 -1.1736
19.3961 0.3925 600 18.7156 -0.3928 -0.5156 0.7225 0.1229 -6.2141 -4.8532 -1.0910 -1.0781
18.0788 0.4580 700 18.5436 -0.4816 -0.6097 0.7200 0.1281 -7.1548 -5.7416 -1.0175 -0.9862
17.9856 0.5234 800 18.1134 -0.4951 -0.6334 0.7325 0.1383 -7.3924 -5.8769 -0.8285 -0.7817
17.0606 0.5888 900 18.0053 -0.4471 -0.5793 0.7400 0.1322 -6.8514 -5.3968 -1.0137 -0.9946
16.8834 0.6542 1000 17.8761 -0.4469 -0.5858 0.7300 0.1389 -6.9156 -5.3947 -0.9112 -0.8927
17.365 0.7197 1100 17.8090 -0.4964 -0.6335 0.7300 0.1372 -7.3933 -5.8892 -0.8480 -0.8176
17.1038 0.7851 1200 17.7880 -0.4890 -0.6243 0.7275 0.1353 -7.3008 -5.8158 -0.8607 -0.8294
17.1968 0.8505 1300 17.7733 -0.4822 -0.6176 0.7325 0.1354 -7.2336 -5.7477 -0.8574 -0.8249
18.234 0.9159 1400 17.7624 -0.4722 -0.6067 0.7375 0.1345 -7.1248 -5.6473 -0.8714 -0.8406
17.4585 0.9814 1500 17.7650 -0.4722 -0.6066 0.7325 0.1344 -7.1238 -5.6471 -0.8746 -0.8435

Framework versions

  • PEFT 0.10.0
  • Transformers 4.43.1
  • Pytorch 2.1.2+cu121
  • Datasets 2.18.0
  • Tokenizers 0.19.1
Downloads last month
2
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Kimory-X/zephyr-7b-ipo-qlora-lr5e6-beta0.1

Adapter
(2255)
this model

Dataset used to train Kimory-X/zephyr-7b-ipo-qlora-lr5e6-beta0.1