Ahmadzei's picture
added 3 more tables for large emb model
5fa1a76
input_ids = tokenizer([TXT], return_tensors="pt")["input_ids"]
logits = bartpho(input_ids).logits
masked_index = (input_ids[0] == tokenizer.mask_token_id).nonzero().item()
probs = logits[0, masked_index].softmax(dim=0)
values, predictions = probs.topk(5)
print(tokenizer.decode(predictions).split())
This implementation is only for tokenization: "monolingual_vocab_file" consists of Vietnamese-specialized types
extracted from the pre-trained SentencePiece model "vocab_file" that is available from the multilingual XLM-RoBERTa.