Spaces:

Ahmadzei
/

RAG

Runtime error

added 3 more tables for large emb model

5fa1a76 over 1 year ago

987 Bytes

	Example of using a model with MeCab and WordPiece tokenization:
	thon

	import torch
	from transformers import AutoModel, AutoTokenizer
	bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese")
	tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese")
	Input Japanese Text
	line = "吾輩は猫である。"
	inputs = tokenizer(line, return_tensors="pt")
	print(tokenizer.decode(inputs["input_ids"][0]))
	[CLS] 吾輩は猫である。 [SEP]
	outputs = bertjapanese(**inputs)

	Example of using a model with Character tokenization:
	thon

	bertjapanese = AutoModel.from_pretrained("cl-tohoku/bert-base-japanese-char")
	tokenizer = AutoTokenizer.from_pretrained("cl-tohoku/bert-base-japanese-char")
	Input Japanese Text
	line = "吾輩は猫である。"
	inputs = tokenizer(line, return_tensors="pt")
	print(tokenizer.decode(inputs["input_ids"][0]))
	[CLS] 吾輩は猫である。 [SEP]
	outputs = bertjapanese(**inputs)

	This model was contributed by cl-tohoku.