The | |
token indices are under the key input_ids: | |
thon | |
encoded_sequence = inputs["input_ids"] | |
print(encoded_sequence) | |
[101, 138, 18696, 155, 1942, 3190, 1144, 1572, 13745, 1104, 159, 9664, 2107, 102] | |
Note that the tokenizer automatically adds "special tokens" (if the associated model relies on them) which are special | |
IDs the model sometimes uses. |