Transformers
Safetensors
Afrikaans
English
m2m_100
text2text-generation
Generated from Trainer
File size: 7,021 Bytes
3dc4fe6
 
 
 
 
 
 
 
9024b4a
 
3dc4fe6
 
 
9024b4a
 
 
 
 
 
 
3dc4fe6
 
 
 
 
 
 
9024b4a
3dc4fe6
 
 
 
 
 
 
9024b4a
3dc4fe6
9024b4a
3dc4fe6
9024b4a
3dc4fe6
 
 
9024b4a
3dc4fe6
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
9024b4a
 
 
 
 
 
3dc4fe6
 
 
 
 
 
 
9024b4a
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
---
library_name: transformers
license: cc-by-nc-4.0
base_model: facebook/nllb-200-3.3B
tags:
- generated_from_trainer
metrics:
- bleu
- chrf
- comet
model-index:
- name: nllb-200-3.3B-bem2en-flores200-bt
  results: []
datasets:
- kreasof-ai/bigc-bem-eng
- kreasof-ai/flores200-eng-bem
- kreasof-ai/tatoeba-eng-bem-backtranslation
language:
- af
- en
---

<!-- This model card has been generated automatically according to the information the Trainer had access to. You
should probably proofread and complete it, then remove this comment. -->

# nllb-200-3.3B-bem2en-flores200-bt

This model is a fine-tuned version of [facebook/nllb-200-3.3B](https://huggingface.co/facebook/nllb-200-3.3B) on the [Big-C dataset](https://huggingface.co/datasets/kreasof-ai/bem-eng-bigc), [Tatoeba Augmented Dataset](https://huggingface.co/datasets/kreasof-ai/tatoeba-eng-bem-backtranslation), and [FLORES-200 Dataset](kreasof-ai/flores200-eng-bem).
It achieves the following results on the evaluation set:
- Loss: 0.2028
- Bleu: 27.8
- Chrf: 51.39

## Model description

This model is a translation model that translate Bemba to English. This model is trained on [facebook/nllb-200-distilled-600M](https://huggingface.co/facebook/nllb-200-distilled-600M). 

## Intended uses

This model is applied to the Bemba-to-English translation task as part of the IWSLT 2025 Low-Resource Track.

## Training and evaluation data

This model is trained using the `train+val` split from Big-C Dataset, `train` split from Augmented Tatoeba Dataset, and `dev` split from FLORES-200 Dataset. Meanwhile for evaluation, this model used `test` split from Big-C and `devtest` split from FLORES-200.

## Training procedure

### Training hyperparameters

The following hyperparameters were used during training:
- learning_rate: 0.0001
- train_batch_size: 8
- eval_batch_size: 8
- seed: 42
- optimizer: Use adamw_torch with betas=(0.9,0.999) and epsilon=1e-08 and optimizer_args=No additional optimizer arguments
- lr_scheduler_type: linear
- lr_scheduler_warmup_ratio: 0.03
- num_epochs: 3
- mixed_precision_training: Native AMP

### Training results

| Training Loss | Epoch | Step  | Validation Loss | Bleu  | Chrf  |
|:-------------:|:-----:|:-----:|:---------------:|:-----:|:-----:|
| 0.1535        | 1.0   | 13236 | 0.1746          | 26.88 | 51.02 |
| 0.1004        | 2.0   | 26472 | 0.1694          | 28.1  | 51.65 |
| 0.0504        | 3.0   | 39708 | 0.2028          | 27.8  | 51.39 |

### Model Evaluation
Performance of this model was evaluated using BLEU, ChrF++, and AfriCOMET on the devtest split of [FLORES-200 Dataset](kreasof-ai/flores200-eng-bem).

| Commit-Hash|Bleu   | ChrF++|AfriCOMET|
|:----------:|:-----:|:-----:|:-------:|
|3dc4f       | 25.06 | 47.61 | 58.6   |

### Framework versions

- Transformers 4.51.3
- Pytorch 2.2.0+cu121
- Datasets 3.5.0
- Tokenizers 0.21.1

## Citation
```
@inproceedings{nllb2022,
  title     = {No Language Left Behind: Scaling Human-Centered Machine Translation},
  author    = {Costa-jussà, Marta R. and Cross, James and et al.},
  booktitle = {Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing (EMNLP)},
  year      = {2022},
  publisher = {Association for Computational Linguistics},
  url       = {https://aclanthology.org/2022.emnlp-main.9}
}
@inproceedings{sikasote-etal-2023-big,
    title = "{BIG}-{C}: a Multimodal Multi-Purpose Dataset for {B}emba",
    author = "Sikasote, Claytone  and
      Mukonde, Eunice  and
      Alam, Md Mahfuz Ibn  and
      Anastasopoulos, Antonios",
    editor = "Rogers, Anna  and
      Boyd-Graber, Jordan  and
      Okazaki, Naoaki",
    booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)",
    month = jul,
    year = "2023",
    address = "Toronto, Canada",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2023.acl-long.115",
    doi = "10.18653/v1/2023.acl-long.115",
    pages = "2062--2078",
    abstract = "We present BIG-C (Bemba Image Grounded Conversations), a large multimodal dataset for Bemba. While Bemba is the most populous language of Zambia, it exhibits a dearth of resources which render the development of language technologies or language processing research almost impossible. The dataset is comprised of multi-turn dialogues between Bemba speakers based on images, transcribed and translated into English. There are more than 92,000 utterances/sentences, amounting to more than 180 hours of audio data with corresponding transcriptions and English translations. We also provide baselines on speech recognition (ASR), machine translation (MT) and speech translation (ST) tasks, and sketch out other potential future multimodal uses of our dataset. We hope that by making the dataset available to the research community, this work will foster research and encourage collaboration across the language, speech, and vision communities especially for languages outside the {``}traditionally{''} used high-resourced ones. All data and code are publicly available: [\url{https://github.com/csikasote/bigc}](\url{https://github.com/csikasote/bigc}).",
}
Copy@inproceedings{wang-etal-2024-afrimte,
    title = "{A}fri{MTE} and {A}fri{COMET}: Enhancing {COMET} to Embrace Under-resourced {A}frican Languages",
    author = "Wang, Jiayi and Adelani, David and Agrawal, Sweta and Masiak, Marek and Rei, Ricardo and Briakou, Eleftheria and Carpuat, Marine and He, Xuanli and others",
    booktitle = "Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers)",
    month = "jun",
    year = "2024",
    address = "Mexico City, Mexico",
    publisher = "Association for Computational Linguistics",
    url = "https://aclanthology.org/2024.naacl-long.334/",
    doi = "10.18653/v1/2024.naacl-long.334",
    pages = "5997--6023"
}
@inproceedings{wang2024evaluating,
  title={Evaluating WMT 2024 Metrics Shared Task Submissions on AfriMTE (the African Challenge Set)},
  author={Wang, Jiayi and Adelani, David Ifeoluwa and Stenetorp, Pontus},
  booktitle={Proceedings of the Ninth Conference on Machine Translation},
  pages={505--516},
  year={2024}
}
@inproceedings{freitag2024llms,
  title={Are LLMs breaking MT metrics? results of the WMT24 metrics shared task},
  author={Freitag, Markus and Mathur, Nitika and Deutsch, Daniel and Lo, Chi-Kiu and Avramidis, Eleftherios and Rei, Ricardo and Thompson, Brian and Blain, Frederic and Kocmi, Tom and Wang, Jiayi and others},
  booktitle={Proceedings of the Ninth Conference on Machine Translation},
  pages={47--81},
  year={2024}
}
```
# Contact
This model was trained by [Hazim](https://huggingface.co/cobrayyxx).
# Acknowledgments
Huge thanks to [Yasmin Moslem](https://huggingface.co/ymoslem) for her supervision, and [Habibullah Akbar](https://huggingface.co/ChavyvAkvar) the founder of Kreasof-AI, for his leadership and support.