When the tokenizer is a "Fast" tokenizer (i.e., backed by | |
HuggingFace tokenizers library), this class provides in addition | |
several advanced alignment methods which can be used to map between the original string (character and words) and the | |
token space (e.g., getting the index of the token comprising a given character or the span of characters corresponding | |
to a given token). |