roberta-base-unified-mcqa: 4-choice

This model is a fine-tuned version of roberta-base on the unified-mcqa dataset (4 choice config). It achieves the following results on the evaluation set:

  • Loss: 0.5534
  • Accuracy: 0.8030
  • Num Input Tokens Seen: 2785906024

Intended uses & limitations

goal is to see if training on general MCQ data helps A) GLUE evals B) results in a better base model than just the MLM output

Training procedure

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 5e-05
  • train_batch_size: 8
  • eval_batch_size: 8
  • seed: 69
  • gradient_accumulation_steps: 8
  • total_train_batch_size: 64
  • optimizer: Use adamw_torch_fused with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
  • lr_scheduler_type: linear
  • lr_scheduler_warmup_steps: 300
  • num_epochs: 3.0

Training results

Training Loss Epoch Step Validation Loss Accuracy Input Tokens Seen
0.9531 0.1189 1000 0.8328 0.6370 111443072
0.8363 0.2377 2000 0.7918 0.6720 222788512
0.7689 0.3566 3000 0.7457 0.6940 334128480
0.8036 0.4754 4000 0.7429 0.6940 445377152
0.7349 0.5943 5000 0.7252 0.7050 556965376
0.7721 0.7131 6000 0.7102 0.7130 668132544
0.6532 0.8320 7000 0.6958 0.7230 779523488
0.6842 0.9509 8000 0.6609 0.7230 891149056
0.576 1.0696 9000 0.6887 0.7360 1002658088
0.6265 1.1885 10000 0.6730 0.7520 1114316936
0.5256 1.3074 11000 0.6860 0.7550 1225691432
0.5701 1.4262 12000 0.6487 0.7530 1337160232
0.4803 1.5451 13000 0.6306 0.7580 1448480392
0.5155 1.6639 14000 0.5834 0.7800 1560022824
0.5221 1.7828 15000 0.6005 0.7850 1671544872
0.4736 1.9016 16000 0.5796 0.7820 1782692648
0.3577 2.0204 17000 0.5753 0.7870 1893957800
0.3656 2.1393 18000 0.6014 0.7930 2005395624
0.3722 2.2582 19000 0.6108 0.7900 2117111816
0.3599 2.3770 20000 0.5826 0.8000 2228698440
0.2723 2.4959 21000 0.5845 0.7910 2340181736
0.2817 2.6147 22000 0.5732 0.7840 2451744808
0.2402 2.7336 23000 0.5544 0.7980 2563194408
0.3318 2.8524 24000 0.5542 0.8000 2674427656
0.272 2.9713 25000 0.5534 0.8030 2785906024

Framework versions

  • Transformers 4.51.3
  • Pytorch 2.6.0+cu124
  • Datasets 3.5.0
  • Tokenizers 0.21.1
Downloads last month
8
Safetensors
Model size
125M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for pszemraj/roberta-base-unified-mcqa

Finetuned
(1526)
this model

Dataset used to train pszemraj/roberta-base-unified-mcqa