To avoid the mismatch between embedding matrix size and vocab | |
size, the tokenizer for GPT-J contains 143 extra tokens | |
<|extratoken_1|> <|extratoken_143|>, so the vocab_size of tokenizer also becomes 50400. |
To avoid the mismatch between embedding matrix size and vocab | |
size, the tokenizer for GPT-J contains 143 extra tokens | |
<|extratoken_1|> <|extratoken_143|>, so the vocab_size of tokenizer also becomes 50400. |