|
|
|
mT5 |
|
|
|
Overview |
|
The mT5 model was presented in mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya |
|
Siddhant, Aditya Barua, Colin Raffel. |
|
The abstract from the paper is the following: |
|
The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain |
|
state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a |
|
multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail |
|
the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual |
|
benchmarks. We also describe a simple technique to prevent "accidental translation" in the zero-shot setting, where a |
|
generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model |
|
checkpoints used in this work are publicly available. |
|
Note: mT5 was only pre-trained on mC4 excluding any supervised training. |
|
Therefore, this model has to be fine-tuned before it is usable on a downstream task, unlike the original T5 model. |
|
Since mT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task |
|
fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix. |
|
Google has released the following variants: |
|
|
|
google/mt5-small |
|
|
|
google/mt5-base |
|
|
|
google/mt5-large |
|
|
|
google/mt5-xl |
|
|
|
google/mt5-xxl. |
|
|
|
This model was contributed by patrickvonplaten. The original code can be |
|
found here. |
|
Resources |
|
|
|
Translation task guide |
|
Summarization task guide |
|
|
|
MT5Config |
|
[[autodoc]] MT5Config |
|
MT5Tokenizer |
|
[[autodoc]] MT5Tokenizer |
|
See [T5Tokenizer] for all details. |
|
MT5TokenizerFast |
|
[[autodoc]] MT5TokenizerFast |
|
See [T5TokenizerFast] for all details. |
|
|
|
MT5Model |
|
[[autodoc]] MT5Model |
|
MT5ForConditionalGeneration |
|
[[autodoc]] MT5ForConditionalGeneration |
|
MT5EncoderModel |
|
[[autodoc]] MT5EncoderModel |
|
MT5ForSequenceClassification |
|
[[autodoc]] MT5ForSequenceClassification |
|
MT5ForTokenClassification |
|
[[autodoc]] MT5ForTokenClassification |
|
MT5ForQuestionAnswering |
|
[[autodoc]] MT5ForQuestionAnswering |
|
|
|
TFMT5Model |
|
[[autodoc]] TFMT5Model |
|
TFMT5ForConditionalGeneration |
|
[[autodoc]] TFMT5ForConditionalGeneration |
|
TFMT5EncoderModel |
|
[[autodoc]] TFMT5EncoderModel |
|
|
|
FlaxMT5Model |
|
[[autodoc]] FlaxMT5Model |
|
FlaxMT5ForConditionalGeneration |
|
[[autodoc]] FlaxMT5ForConditionalGeneration |
|
FlaxMT5EncoderModel |
|
[[autodoc]] FlaxMT5EncoderModel |
|
|
|
|