RAG / knowledge_base /model_doc_mt5.txt
Ahmadzei's picture
update 1
57bdca5
mT5
Overview
The mT5 model was presented in mT5: A massively multilingual pre-trained text-to-text transformer by Linting Xue, Noah Constant, Adam Roberts, Mihir Kale, Rami Al-Rfou, Aditya
Siddhant, Aditya Barua, Colin Raffel.
The abstract from the paper is the following:
The recent "Text-to-Text Transfer Transformer" (T5) leveraged a unified text-to-text format and scale to attain
state-of-the-art results on a wide variety of English-language NLP tasks. In this paper, we introduce mT5, a
multilingual variant of T5 that was pre-trained on a new Common Crawl-based dataset covering 101 languages. We detail
the design and modified training of mT5 and demonstrate its state-of-the-art performance on many multilingual
benchmarks. We also describe a simple technique to prevent "accidental translation" in the zero-shot setting, where a
generative model chooses to (partially) translate its prediction into the wrong language. All of the code and model
checkpoints used in this work are publicly available.
Note: mT5 was only pre-trained on mC4 excluding any supervised training.
Therefore, this model has to be fine-tuned before it is usable on a downstream task, unlike the original T5 model.
Since mT5 was pre-trained unsupervisedly, there's no real advantage to using a task prefix during single-task
fine-tuning. If you are doing multi-task fine-tuning, you should use a prefix.
Google has released the following variants:
google/mt5-small
google/mt5-base
google/mt5-large
google/mt5-xl
google/mt5-xxl.
This model was contributed by patrickvonplaten. The original code can be
found here.
Resources
Translation task guide
Summarization task guide
MT5Config
[[autodoc]] MT5Config
MT5Tokenizer
[[autodoc]] MT5Tokenizer
See [T5Tokenizer] for all details.
MT5TokenizerFast
[[autodoc]] MT5TokenizerFast
See [T5TokenizerFast] for all details.
MT5Model
[[autodoc]] MT5Model
MT5ForConditionalGeneration
[[autodoc]] MT5ForConditionalGeneration
MT5EncoderModel
[[autodoc]] MT5EncoderModel
MT5ForSequenceClassification
[[autodoc]] MT5ForSequenceClassification
MT5ForTokenClassification
[[autodoc]] MT5ForTokenClassification
MT5ForQuestionAnswering
[[autodoc]] MT5ForQuestionAnswering
TFMT5Model
[[autodoc]] TFMT5Model
TFMT5ForConditionalGeneration
[[autodoc]] TFMT5ForConditionalGeneration
TFMT5EncoderModel
[[autodoc]] TFMT5EncoderModel
FlaxMT5Model
[[autodoc]] FlaxMT5Model
FlaxMT5ForConditionalGeneration
[[autodoc]] FlaxMT5ForConditionalGeneration
FlaxMT5EncoderModel
[[autodoc]] FlaxMT5EncoderModel