Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
To avoid the mismatch between embedding matrix size and vocab
size, the tokenizer for GPT-J contains 143 extra tokens
<|extratoken_1|> <|extratoken_143|>, so the vocab_size of tokenizer also becomes 50400.