Cloudy1225
/

stackoverflow-roberta-base-sentiment

Text Classification

Model card Files Files and versions Community

stackoverflow-roberta-base-sentiment / README.md

Cloudy1225's picture

Update README.md

9e3090a verified 8 months ago

|

2.8 kB

	---
	license: openrail
	---

	# StackOverflow-RoBERTa-base for Sentiment Analysis on Software Engineering Texts

	This is a RoBERTa-base model for sentiment analysis on software engineering texts. It is re-finetuned from [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) with [StackOverflow4423](https://arxiv.org/abs/1709.02984) dataset. You can access the demo [here](https://huggingface.co/spaces/Cloudy1225/stackoverflow-sentiment-analysis).

	## Example of Pipeline


	```python
	from transformers import pipeline

	MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
	sentiment_task = pipeline(task="sentiment-analysis", model=MODEL)
	sentiment_task(["Excellent, happy to help!",
	"This can probably be done using JavaScript.",
	"Yes, but it's tricky, since datetime parsing in SQL is a pain in the neck."])
	```

	[{'label': 'positive', 'score': 0.9997847676277161},
	{'label': 'neutral', 'score': 0.999783456325531},
	{'label': 'negative', 'score': 0.9996368885040283}]



	## Example of Classification


	```python
	from scipy.special import softmax
	from transformers import AutoTokenizer, AutoModelForSequenceClassification

	def preprocess(text):
	"""Preprocess text (username and link placeholders)"""
	new_text = []
	for t in text.split(' '):
	t = '@user' if t.startswith('@') and len(t) > 1 else t
	t = 'http' if t.startswith('http') else t
	new_text.append(t)
	return ' '.join(new_text).strip()

	MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
	tokenizer = AutoTokenizer.from_pretrained(MODEL)
	model = AutoModelForSequenceClassification.from_pretrained(MODEL)

	text = "Excellent, happy to help!"
	text = preprocess(text)
	encoded_input = tokenizer(text, return_tensors='pt')
	output = model(**encoded_input)
	scores = output[0][0].detach().numpy()
	scores = softmax(scores)
	print("negative", scores[0])
	print("neutral", scores[1])
	print("positive", scores[2])
	```

	negative 0.00015578205
	neutral 5.9470447e-05
	positive 0.99978495



	## Acknowledgments

	This project was developed as part of the Software Engineering and Computing III course at Software Institute, Nanjing University in Spring 2023. For more insights into sentiment analysis on software engineering texts, you can refer to the following paper:

	```
	@inproceedings{sun2022incorporating,
	title={Incorporating Pre-trained Transformer Models into TextCNN for Sentiment Analysis on Software Engineering Texts},
	author={Sun, Kexin and Shi, Xiaobo and Gao, Hui and Kuang, Hongyu and Ma, Xiaoxing and Rong, Guoping and Shao, Dong and Zhao, Zheng and Zhang, He},
	booktitle={Proceedings of the 13th Asia-Pacific Symposium on Internetware},
	pages={127--136},
	year={2022}
	}
	```