roberta-base-unified-mcqa: 4-choice

This model is a fine-tuned version of roberta-base on the unified-mcqa dataset (4 choice config). It achieves the following results on the evaluation set:

Loss: 0.5534
Accuracy: 0.8030
Num Input Tokens Seen: 2785906024

Intended uses & limitations

goal is to see if training on general MCQ data helps A) GLUE evals B) results in a better base model than just the MLM output

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 5e-05
train_batch_size: 8
eval_batch_size: 8
seed: 69
gradient_accumulation_steps: 8
total_train_batch_size: 64
optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
lr_scheduler_type: linear
lr_scheduler_warmup_steps: 300
num_epochs: 3.0

Training results

Training Loss	Epoch	Step	Validation Loss	Accuracy	Input Tokens Seen
0.9531	0.1189	1000	0.8328	0.6370	111443072
0.8363	0.2377	2000	0.7918	0.6720	222788512
0.7689	0.3566	3000	0.7457	0.6940	334128480
0.8036	0.4754	4000	0.7429	0.6940	445377152
0.7349	0.5943	5000	0.7252	0.7050	556965376
0.7721	0.7131	6000	0.7102	0.7130	668132544
0.6532	0.8320	7000	0.6958	0.7230	779523488
0.6842	0.9509	8000	0.6609	0.7230	891149056
0.576	1.0696	9000	0.6887	0.7360	1002658088
0.6265	1.1885	10000	0.6730	0.7520	1114316936
0.5256	1.3074	11000	0.6860	0.7550	1225691432
0.5701	1.4262	12000	0.6487	0.7530	1337160232
0.4803	1.5451	13000	0.6306	0.7580	1448480392
0.5155	1.6639	14000	0.5834	0.7800	1560022824
0.5221	1.7828	15000	0.6005	0.7850	1671544872
0.4736	1.9016	16000	0.5796	0.7820	1782692648
0.3577	2.0204	17000	0.5753	0.7870	1893957800
0.3656	2.1393	18000	0.6014	0.7930	2005395624
0.3722	2.2582	19000	0.6108	0.7900	2117111816
0.3599	2.3770	20000	0.5826	0.8000	2228698440
0.2723	2.4959	21000	0.5845	0.7910	2340181736
0.2817	2.6147	22000	0.5732	0.7840	2451744808
0.2402	2.7336	23000	0.5544	0.7980	2563194408
0.3318	2.8524	24000	0.5542	0.8000	2674427656
0.272	2.9713	25000	0.5534	0.8030	2785906024

Framework versions

Transformers 4.51.3
Pytorch 2.6.0+cu124
Datasets 3.5.0
Tokenizers 0.21.1

pszemraj
/

roberta-base-unified-mcqa

roberta-base-unified-mcqa: 4-choice

Intended uses & limitations

Training procedure

Training hyperparameters

Training results

Framework versions

Model tree for pszemraj/roberta-base-unified-mcqa

Dataset used to train pszemraj/roberta-base-unified-mcqa

Evaluation results