File size: 550 Bytes
5fa1a76 |
1 2 3 4 5 6 7 8 9 10 11 12 13 |
We can pass a list to the tokenizer and ask it to pad like this: thon padded_sequences = tokenizer([sequence_a, sequence_b], padding=True) We can see that 0s have been added on the right of the first sentence to make it the same length as the second one: thon padded_sequences["input_ids"] [[101, 1188, 1110, 170, 1603, 4954, 119, 102, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0], [101, 1188, 1110, 170, 1897, 1263, 4954, 119, 1135, 1110, 1120, 1655, 2039, 1190, 1103, 4954, 138, 119, 102]] This can then be converted into a tensor in PyTorch or TensorFlow. |