File size: 987 Bytes
5fa1a76 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 |
Example of using a model with MeCab and WordPiece tokenization: thon import torch from transformers import AutoModel, AutoTokenizer bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese") tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese") Input Japanese Text line = "吾輩は猫である。" inputs = tokenizer(line, return_tensors="pt") print(tokenizer.decode(inputs["input_ids"][0])) [CLS] 吾輩 は 猫 で ある 。 [SEP] outputs = bertjapanese(**inputs) Example of using a model with Character tokenization: thon bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese-char") tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese-char") Input Japanese Text line = "吾輩は猫である。" inputs = tokenizer(line, return_tensors="pt") print(tokenizer.decode(inputs["input_ids"][0])) [CLS] 吾 輩 は 猫 で あ る 。 [SEP] outputs = bertjapanese(**inputs) This model was contributed by cl-tohoku. |