Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / knowledge_base /model_doc_phobert.txt

Ahmadzei

update 1

57bdca5 over 1 year ago

raw

history blame contribute delete

1.58 kB


	PhoBERT
	Overview
	The PhoBERT model was proposed in PhoBERT: Pre-trained language models for Vietnamese by Dat Quoc Nguyen, Anh Tuan Nguyen.
	The abstract from the paper is the following:
	We present PhoBERT with two versions, PhoBERT-base and PhoBERT-large, the first public large-scale monolingual
	language models pre-trained for Vietnamese. Experimental results show that PhoBERT consistently outperforms the recent
	best pre-trained multilingual model XLM-R (Conneau et al., 2020) and improves the state-of-the-art in multiple
	Vietnamese-specific NLP tasks including Part-of-speech tagging, Dependency parsing, Named-entity recognition and
	Natural language inference.
	This model was contributed by dqnguyen. The original code can be found here.
	Usage example
	thon

	import torch
	from transformers import AutoModel, AutoTokenizer
	phobert = AutoModel.from_pretrained("vinai/phobert-base")
	tokenizer = AutoTokenizer.from_pretrained("vinai/phobert-base")
	INPUT TEXT MUST BE ALREADY WORD-SEGMENTED!
	line = "Tôi là sinh_viên trường đại_học Công_nghệ ."
	input_ids = torch.tensor([tokenizer.encode(line)])
	with torch.no_grad():
	features = phobert(input_ids) # Models outputs are now tuples
	With TensorFlow 2.0+:
	from transformers import TFAutoModel
	phobert = TFAutoModel.from_pretrained("vinai/phobert-base")


	PhoBERT implementation is the same as BERT, except for tokenization. Refer to EART documentation for information on
	configuration classes and their parameters. PhoBERT-specific tokenizer is documented below.

	PhobertTokenizer
	[[autodoc]] PhobertTokenizer