Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
When the tokenizer is a "Fast" tokenizer (i.e., backed by
HuggingFace tokenizers library), this class provides in addition
several advanced alignment methods which can be used to map between the original string (character and words) and the
token space (e.g., getting the index of the token comprising a given character or the span of characters corresponding
to a given token).