input_ids = tokenizer([TXT], return_tensors="pt")["input_ids"] | |
logits = bartpho(input_ids).logits | |
masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item() | |
probs = logits[0, masked_index].softmax(dim=0) | |
values, predictions = probs.topk(5) | |
print(tokenizer.decode(predictions).split()) | |
This implementation is only for tokenization: "monolingual_vocab_file" consists of Vietnamese-specialized types | |
extracted from the pre-trained SentencePiece model "vocab_file" that is available from the multilingual XLM-RoBERTa. |