|
|
|
CamemBERT |
|
Overview |
|
The CamemBERT model was proposed in CamemBERT: a Tasty French Language Model by |
|
Louis Martin, Benjamin Muller, Pedro Javier Ortiz Suárez, Yoann Dupont, Laurent Romary, Éric Villemonte de la |
|
Clergerie, Djamé Seddah, and Benoît Sagot. It is based on Facebook's RoBERTa model released in 2019. It is a model |
|
trained on 138GB of French text. |
|
The abstract from the paper is the following: |
|
Pretrained language models are now ubiquitous in Natural Language Processing. Despite their success, most available |
|
models have either been trained on English data or on the concatenation of data in multiple languages. This makes |
|
practical use of such models --in all languages except English-- very limited. Aiming to address this issue for French, |
|
we release CamemBERT, a French version of the Bi-directional Encoders for Transformers (BERT). We measure the |
|
performance of CamemBERT compared to multilingual models in multiple downstream tasks, namely part-of-speech tagging, |
|
dependency parsing, named-entity recognition, and natural language inference. CamemBERT improves the state of the art |
|
for most of the tasks considered. We release the pretrained model for CamemBERT hoping to foster research and |
|
downstream applications for French NLP. |
|
This model was contributed by the ALMAnaCH team (Inria). The original code can be found here. |
|
|
|
This implementation is the same as RoBERTa. Refer to the documentation of RoBERTa for usage examples as well |
|
as the information relative to the inputs and outputs. |
|
|
|
Resources |
|
|
|
Text classification task guide |
|
Token classification task guide |
|
Question answering task guide |
|
Causal language modeling task guide |
|
Masked language modeling task guide |
|
Multiple choice task guide |
|
|
|
CamembertConfig |
|
[[autodoc]] CamembertConfig |
|
CamembertTokenizer |
|
[[autodoc]] CamembertTokenizer |
|
- build_inputs_with_special_tokens |
|
- get_special_tokens_mask |
|
- create_token_type_ids_from_sequences |
|
- save_vocabulary |
|
CamembertTokenizerFast |
|
[[autodoc]] CamembertTokenizerFast |
|
|
|
CamembertModel |
|
[[autodoc]] CamembertModel |
|
CamembertForCausalLM |
|
[[autodoc]] CamembertForCausalLM |
|
CamembertForMaskedLM |
|
[[autodoc]] CamembertForMaskedLM |
|
CamembertForSequenceClassification |
|
[[autodoc]] CamembertForSequenceClassification |
|
CamembertForMultipleChoice |
|
[[autodoc]] CamembertForMultipleChoice |
|
CamembertForTokenClassification |
|
[[autodoc]] CamembertForTokenClassification |
|
CamembertForQuestionAnswering |
|
[[autodoc]] CamembertForQuestionAnswering |
|
|
|
TFCamembertModel |
|
[[autodoc]] TFCamembertModel |
|
TFCamembertForCasualLM |
|
[[autodoc]] TFCamembertForCausalLM |
|
TFCamembertForMaskedLM |
|
[[autodoc]] TFCamembertForMaskedLM |
|
TFCamembertForSequenceClassification |
|
[[autodoc]] TFCamembertForSequenceClassification |
|
TFCamembertForMultipleChoice |
|
[[autodoc]] TFCamembertForMultipleChoice |
|
TFCamembertForTokenClassification |
|
[[autodoc]] TFCamembertForTokenClassification |
|
TFCamembertForQuestionAnswering |
|
[[autodoc]] TFCamembertForQuestionAnswering |
|
|
|
|