Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
[PreTrainedModel] and [TFPreTrainedModel] also implement a few methods which
are common among all the models to:
resize the input token embeddings when new tokens are added to the vocabulary
prune the attention heads of the model.