Check copies Since the Transformers library is very opinionated with respect to model code, and each model should fully be implemented in a single file without relying on other models, we have added a mechanism that checks whether a copy of the code of a layer of a given model stays consistent with the original. This way, when there is a bug fix, we can see all other impacted models and choose to trickle down the modification or break the copy. If a file is a full copy of another file, you should register it in the constant FULL_COPIES of utils/check_copies.py. This mechanism relies on comments of the form # Copied from xxx. The xxx should contain the whole path to the class of function which is being copied below. For instance, RobertaSelfOutput is a direct copy of the BertSelfOutput class, so you can see here it has a comment: Copied from transformers.models.bert.modeling_bert.BertSelfOutput Note that instead of applying this to a whole class, you can apply it to the relevant methods that are copied from. For instance here you can see how RobertaPreTrainedModel._init_weights is copied from the same method in BertPreTrainedModel with the comment: Copied from transformers.models.bert.modeling_bert.BertPreTrainedModel._init_weights Sometimes the copy is exactly the same except for names: for instance in RobertaAttention, we use RobertaSelfAttention insted of BertSelfAttention but other than that, the code is exactly the same. This is why # Copied from supports simple string replacements with the following syntax: Copied from xxx with foo->bar. This means the code is copied with all instances of foo being replaced by bar. You can see how it used here in RobertaAttention with the comment: Copied from transformers.models.bert.modeling_bert.BertAttention with Bert->Roberta Note that there shouldn't be any spaces around the arrow (unless that space is part of the pattern to replace of course). You can add several patterns separated by a comma. For instance here CamemberForMaskedLM is a direct copy of RobertaForMaskedLM with two replacements: Roberta to Camembert and ROBERTA to CAMEMBERT. You can see here this is done with the comment: Copied from transformers.models.roberta.modeling_roberta.RobertaForMaskedLM with Roberta->Camembert, ROBERTA->CAMEMBERT If the order matters (because one of the replacements might conflict with a previous one), the replacements are executed from left to right. If the replacements change the formatting (if you replace a short name by a very long name for instance), the copy is checked after applying the auto-formatter. Another way when the patterns are just different casings of the same replacement (with an uppercased and a lowercased variants) is just to add the option all-casing. Here is an example in MobileBertForSequenceClassification with the comment: Copied from transformers.models.bert.modeling_bert.BertForSequenceClassification with Bert->MobileBert all-casing In this case, the code is copied from BertForSequenceClassification by replacing: - Bert by MobileBert (for instance when using MobileBertModel in the init) - bert by mobilebert (for instance when defining self.mobilebert) - BERT by MOBILEBERT (in the constant MOBILEBERT_INPUTS_DOCSTRING)