zephyr-7b-ipo-qlora-lr5e6-beta0.1

This model is a fine-tuned version of alignment-handbook/zephyr-7b-sft-qlora on the HuggingFaceH4/ultrafeedback_binarized dataset. It achieves the following results on the evaluation set:

Loss: 17.7650
Rewards/chosen: -0.4722
Rewards/rejected: -0.6065
Rewards/accuracies: 0.7325
Rewards/margins: 0.1344
Logps/rejected: -7.1232
Logps/chosen: -5.6472
Logits/rejected: -0.8746
Logits/chosen: -0.8434

Model description

More information needed

Intended uses & limitations

More information needed

Training and evaluation data

More information needed

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-06
train_batch_size: 2
eval_batch_size: 4
seed: 42
distributed_type: multi-GPU
num_devices: 5
gradient_accumulation_steps: 4
total_train_batch_size: 40
total_eval_batch_size: 20
optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
lr_scheduler_type: cosine
lr_scheduler_warmup_ratio: 0.1
num_epochs: 1

Training results

Training Loss	Epoch	Step	Validation Loss	Rewards/chosen	Rewards/rejected	Rewards/accuracies	Rewards/margins	Logps/rejected	Logps/chosen	Logits/rejected	Logits/chosen
24.1839	0.0654	100	24.1345	-0.0190	-0.0307	0.6325	0.0117	-1.3653	-1.1158	-2.1858	-2.2703
22.9327	0.1308	200	23.1023	-0.0531	-0.0855	0.6800	0.0324	-1.9128	-1.4563	-2.1297	-2.2081
19.7663	0.1963	300	20.8720	-0.2046	-0.2845	0.7125	0.0800	-3.9032	-2.9713	-1.7473	-1.7810
19.9497	0.2617	400	19.7683	-0.3156	-0.4116	0.6975	0.0960	-5.1735	-4.0811	-1.5106	-1.5203
19.9461	0.3271	500	19.0518	-0.3438	-0.4530	0.7200	0.1092	-5.5879	-4.3632	-1.1895	-1.1736
19.3961	0.3925	600	18.7156	-0.3928	-0.5156	0.7225	0.1229	-6.2141	-4.8532	-1.0910	-1.0781
18.0788	0.4580	700	18.5436	-0.4816	-0.6097	0.7200	0.1281	-7.1548	-5.7416	-1.0175	-0.9862
17.9856	0.5234	800	18.1134	-0.4951	-0.6334	0.7325	0.1383	-7.3924	-5.8769	-0.8285	-0.7817
17.0606	0.5888	900	18.0053	-0.4471	-0.5793	0.7400	0.1322	-6.8514	-5.3968	-1.0137	-0.9946
16.8834	0.6542	1000	17.8761	-0.4469	-0.5858	0.7300	0.1389	-6.9156	-5.3947	-0.9112	-0.8927
17.365	0.7197	1100	17.8090	-0.4964	-0.6335	0.7300	0.1372	-7.3933	-5.8892	-0.8480	-0.8176
17.1038	0.7851	1200	17.7880	-0.4890	-0.6243	0.7275	0.1353	-7.3008	-5.8158	-0.8607	-0.8294
17.1968	0.8505	1300	17.7733	-0.4822	-0.6176	0.7325	0.1354	-7.2336	-5.7477	-0.8574	-0.8249
18.234	0.9159	1400	17.7624	-0.4722	-0.6067	0.7375	0.1345	-7.1248	-5.6473	-0.8714	-0.8406
17.4585	0.9814	1500	17.7650	-0.4722	-0.6066	0.7325	0.1344	-7.1238	-5.6471	-0.8746	-0.8435

Framework versions

PEFT 0.10.0
Transformers 4.43.1
Pytorch 2.1.2+cu121
Datasets 2.18.0
Tokenizers 0.19.1

Kimory-X
/

zephyr-7b-ipo-qlora-lr5e6-beta0.1

zephyr-7b-ipo-qlora-lr5e6-beta0.1

Model description

Intended uses & limitations

Training and evaluation data

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for Kimory-X/zephyr-7b-ipo-qlora-lr5e6-beta0.1

Dataset used to train Kimory-X/zephyr-7b-ipo-qlora-lr5e6-beta0.1

Evaluation results