Spaces:

Ahmadzei
/

RAG

Runtime error

App Files Files Community

RAG / knowledge_base /model_doc_bertweet.txt

Ahmadzei

update 1

57bdca5 over 1 year ago

raw

history blame contribute delete

1.74 kB


	BERTweet
	Overview
	The BERTweet model was proposed in BERTweet: A pre-trained language model for English Tweets by Dat Quoc Nguyen, Thanh Vu, Anh Tuan Nguyen.
	The abstract from the paper is the following:
	We present BERTweet, the first public large-scale pre-trained language model for English Tweets. Our BERTweet, having
	the same architecture as BERT-base (Devlin et al., 2019), is trained using the RoBERTa pre-training procedure (Liu et
	al., 2019). Experiments show that BERTweet outperforms strong baselines RoBERTa-base and XLM-R-base (Conneau et al.,
	2020), producing better performance results than the previous state-of-the-art models on three Tweet NLP tasks:
	Part-of-speech tagging, Named-entity recognition and text classification.
	This model was contributed by dqnguyen. The original code can be found here.
	Usage example
	thon

	import torch
	from transformers import AutoModel, AutoTokenizer
	bertweet = AutoModel.from_pretrained("vinai/bertweet-base")
	For transformers v4.x+:
	tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base", use_fast=False)
	For transformers v3.x:
	tokenizer = AutoTokenizer.from_pretrained("vinai/bertweet-base")
	INPUT TWEET IS ALREADY NORMALIZED!
	line = "SC has first two presumptive cases of coronavirus , DHEC confirms HTTPURL via @USER :cry:"
	input_ids = torch.tensor([tokenizer.encode(line)])
	with torch.no_grad():
	features = bertweet(input_ids) # Models outputs are now tuples
	With TensorFlow 2.0+:
	from transformers import TFAutoModel
	bertweet = TFAutoModel.from_pretrained("vinai/bertweet-base")


	This implementation is the same as BERT, except for tokenization method. Refer to BERT documentation for
	API reference information.

	BertweetTokenizer
	[[autodoc]] BertweetTokenizer