File size: 330 Bytes
5fa1a76 |
1 2 3 4 5 6 |
input_ids = tokenizer(input_ids_prompt).input_ids Note that we cannot add "{extra_id_}" to the string directly as the Byte tokenizer would incorrectly merge the tokens For ByT5, we need to work directly on the character level Contrary to T5, ByT5 does not use sentinel tokens for masking, but instead uses final utf character ids. |