MLRS
/

You need to agree to share your contact information to access this model

This repository is publicly accessible, but you have to accept the conditions to access its files and content.

Log in or Sign Up to review the conditions and access this model content.

mT5-Small (SIB-200 Maltese)

This model is a fine-tuned version of google/mt5-small on the Davlan/sib200 mlt_Latn dataset. It achieves the following results on the test set:

  • Loss: 1.2509
  • F1: 0.7679

Intended uses & limitations

The model is fine-tuned on a specific task and it should be used on the same or similar task. Any limitations present in the base model are inherited.

Training procedure

The model was fine-tuned using a customised script.

Training hyperparameters

The following hyperparameters were used during training:

  • learning_rate: 0.001
  • train_batch_size: 32
  • eval_batch_size: 32
  • seed: 42
  • optimizer: Use adafactor and the args are: No additional optimizer arguments
  • lr_scheduler_type: linear
  • num_epochs: 200.0
  • early_stopping_patience: 20

Training results

Training Loss Epoch Step Validation Loss F1
No log 1.0 22 3.0282 0.32
No log 2.0 44 1.0814 0.4516
No log 3.0 66 0.7254 0.5333
No log 4.0 88 0.7760 0.5396
No log 5.0 110 0.6758 0.6601
No log 6.0 132 0.5517 0.6611
No log 7.0 154 0.5050 0.6943
No log 8.0 176 0.4567 0.6962
No log 9.0 198 0.4953 0.7500
No log 10.0 220 0.5387 0.7282
No log 11.0 242 0.6183 0.7326
No log 12.0 264 0.6277 0.6987
No log 13.0 286 0.5040 0.7696
No log 14.0 308 0.6528 0.7812
No log 15.0 330 0.5787 0.7209
No log 16.0 352 0.6691 0.7653
No log 17.0 374 0.6764 0.7445
No log 18.0 396 0.8734 0.7762
No log 19.0 418 1.3617 0.7327
No log 20.0 440 1.2030 0.7778
No log 21.0 462 1.1081 0.7680
No log 22.0 484 1.2747 0.7693
0.9564 23.0 506 0.9481 0.7459
0.9564 24.0 528 0.8424 0.7921
0.9564 25.0 550 1.2598 0.7872
0.9564 26.0 572 1.0477 0.7792
0.9564 27.0 594 0.8808 0.7475
0.9564 28.0 616 1.1499 0.7736
0.9564 29.0 638 1.0573 0.7944
0.9564 30.0 660 1.0368 0.7934
0.9564 31.0 682 1.1419 0.7547
0.9564 32.0 704 1.3188 0.7738
0.9564 33.0 726 1.1331 0.7661
0.9564 34.0 748 1.6081 0.7578
0.9564 35.0 770 1.2847 0.7379
0.9564 36.0 792 1.5785 0.7531
0.9564 37.0 814 1.3492 0.7352
0.9564 38.0 836 1.1893 0.7185
0.9564 39.0 858 1.3252 0.7813
0.9564 40.0 880 1.6386 0.7796
0.9564 41.0 902 1.7053 0.7709
0.9564 42.0 924 1.3946 0.7597
0.9564 43.0 946 1.4878 0.7734
0.9564 44.0 968 1.7042 0.7750
0.9564 45.0 990 1.7275 0.7626
0.0307 46.0 1012 1.6951 0.7592
0.0307 47.0 1034 1.5251 0.7723
0.0307 48.0 1056 1.4254 0.7821
0.0307 49.0 1078 1.7755 0.7715

Framework versions

  • Transformers 4.48.2
  • Pytorch 2.4.1+cu121
  • Datasets 3.2.0
  • Tokenizers 0.21.0

License

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License. Permissions beyond the scope of this license may be available at https://mlrs.research.um.edu.mt/.

CC BY-NC-SA 4.0

Citation

This work was first presented in MELABenchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource Maltese NLP. Cite it as follows:

@inproceedings{micallef-borg-2025-melabenchv1,
    title = "{MELAB}enchv1: Benchmarking Large Language Models against Smaller Fine-Tuned Models for Low-Resource {M}altese {NLP}",
    author = "Micallef, Kurt  and
      Borg, Claudia",
    editor = "Che, Wanxiang  and
      Nabende, Joyce  and
      Shutova, Ekaterina  and
      Pilehvar, Mohammad Taher",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2025",
    month = jul,
    year = "2025",
    address = "Vienna, Austria",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2025.findings-acl.1053/",
    doi = "10.18653/v1/2025.findings-acl.1053",
    pages = "20505--20527",
    ISBN = "979-8-89176-256-5",
}
Downloads last month
-
Safetensors
Model size
300M params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for MLRS/mt5-small_sib200-mlt

Base model

google/mt5-small
Finetuned
(535)
this model

Dataset used to train MLRS/mt5-small_sib200-mlt

Collection including MLRS/mt5-small_sib200-mlt

Evaluation results