mT5-Small (SIB-200 Maltese)

This model is a fine-tuned version of google/mt5-small on the Davlan/sib200 mlt_Latn dataset. It achieves the following results on the test set:

Loss: 1.2509
F1: 0.7679

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

learning_rate: 0.001
train_batch_size: 32
eval_batch_size: 32
seed: 42
optimizer: Use adafactor and the args are: No additional optimizer arguments
lr_scheduler_type: linear
num_epochs: 200.0
early_stopping_patience: 20

Training results

Training Loss	Epoch	Step	Validation Loss	F1
No log	1.0	22	3.0282	0.32
No log	2.0	44	1.0814	0.4516
No log	3.0	66	0.7254	0.5333
No log	4.0	88	0.7760	0.5396
No log	5.0	110	0.6758	0.6601
No log	6.0	132	0.5517	0.6611
No log	7.0	154	0.5050	0.6943
No log	8.0	176	0.4567	0.6962
No log	9.0	198	0.4953	0.7500
No log	10.0	220	0.5387	0.7282
No log	11.0	242	0.6183	0.7326
No log	12.0	264	0.6277	0.6987
No log	13.0	286	0.5040	0.7696
No log	14.0	308	0.6528	0.7812
No log	15.0	330	0.5787	0.7209
No log	16.0	352	0.6691	0.7653
No log	17.0	374	0.6764	0.7445
No log	18.0	396	0.8734	0.7762
No log	19.0	418	1.3617	0.7327
No log	20.0	440	1.2030	0.7778
No log	21.0	462	1.1081	0.7680
No log	22.0	484	1.2747	0.7693
0.9564	23.0	506	0.9481	0.7459
0.9564	24.0	528	0.8424	0.7921
0.9564	25.0	550	1.2598	0.7872
0.9564	26.0	572	1.0477	0.7792
0.9564	27.0	594	0.8808	0.7475
0.9564	28.0	616	1.1499	0.7736
0.9564	29.0	638	1.0573	0.7944
0.9564	30.0	660	1.0368	0.7934
0.9564	31.0	682	1.1419	0.7547
0.9564	32.0	704	1.3188	0.7738
0.9564	33.0	726	1.1331	0.7661
0.9564	34.0	748	1.6081	0.7578
0.9564	35.0	770	1.2847	0.7379
0.9564	36.0	792	1.5785	0.7531
0.9564	37.0	814	1.3492	0.7352
0.9564	38.0	836	1.1893	0.7185
0.9564	39.0	858	1.3252	0.7813
0.9564	40.0	880	1.6386	0.7796
0.9564	41.0	902	1.7053	0.7709
0.9564	42.0	924	1.3946	0.7597
0.9564	43.0	946	1.4878	0.7734
0.9564	44.0	968	1.7042	0.7750
0.9564	45.0	990	1.7275	0.7626
0.0307	46.0	1012	1.6951	0.7592
0.0307	47.0	1034	1.5251	0.7723
0.0307	48.0	1056	1.4254	0.7821
0.0307	49.0	1078	1.7755	0.7715

Framework versions

Transformers 4.48.2
Pytorch 2.4.1+cu121
Datasets 3.2.0
Tokenizers 0.21.0

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}

MLRS
/

mt5-small_sib200-mlt

You need to agree to share your contact information to access this model

mT5-Small (SIB-200 Maltese)

Intended uses & limitations

Training procedure

Training hyperparameters

Training results

Framework versions

License

Citation

Model tree for MLRS/mt5-small_sib200-mlt

Dataset used to train MLRS/mt5-small_sib200-mlt

Collection including MLRS/mt5-small_sib200-mlt

mT5-Small

Evaluation results