|
--- |
|
license: openrail |
|
--- |
|
|
|
# StackOverflow-RoBERTa-base for Sentiment Analysis on Software Engineering Texts |
|
|
|
This is a RoBERTa-base model for sentiment analysis on software engineering texts. It is re-finetuned from [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) with [StackOverflow4423](https://arxiv.org/abs/1709.02984) dataset. You can access the demo [here](https://huggingface.co/spaces/Cloudy1225/stackoverflow-sentiment-analysis). |
|
|
|
## Example of Pipeline |
|
|
|
|
|
```python |
|
from transformers import pipeline |
|
|
|
MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment' |
|
sentiment_task = pipeline(task="sentiment-analysis", model=MODEL) |
|
sentiment_task(["Excellent, happy to help!", |
|
"This can probably be done using JavaScript.", |
|
"Yes, but it's tricky, since datetime parsing in SQL is a pain in the neck."]) |
|
``` |
|
|
|
[{'label': 'positive', 'score': 0.9997847676277161}, |
|
{'label': 'neutral', 'score': 0.999783456325531}, |
|
{'label': 'negative', 'score': 0.9996368885040283}] |
|
|
|
|
|
|
|
## Example of Classification |
|
|
|
|
|
```python |
|
from scipy.special import softmax |
|
from transformers import AutoTokenizer, AutoModelForSequenceClassification |
|
|
|
def preprocess(text): |
|
"""Preprocess text (username and link placeholders)""" |
|
new_text = [] |
|
for t in text.split(' '): |
|
t = '@user' if t.startswith('@') and len(t) > 1 else t |
|
t = 'http' if t.startswith('http') else t |
|
new_text.append(t) |
|
return ' '.join(new_text).strip() |
|
|
|
MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment' |
|
tokenizer = AutoTokenizer.from_pretrained(MODEL) |
|
model = AutoModelForSequenceClassification.from_pretrained(MODEL) |
|
|
|
text = "Excellent, happy to help!" |
|
text = preprocess(text) |
|
encoded_input = tokenizer(text, return_tensors='pt') |
|
output = model(**encoded_input) |
|
scores = output[0][0].detach().numpy() |
|
scores = softmax(scores) |
|
print("negative", scores[0]) |
|
print("neutral", scores[1]) |
|
print("positive", scores[2]) |
|
``` |
|
|
|
negative 0.00015578205 |
|
neutral 5.9470447e-05 |
|
positive 0.99978495 |
|
|
|
|
|
|
|
## Acknowledgments |
|
|
|
This project was developed as part of the **Software Engineering and Computing III** course at Software Institute, Nanjing University in Spring 2023. For more insights into sentiment analysis on software engineering texts, you can refer to the following paper: |
|
|
|
``` |
|
@inproceedings{sun2022incorporating, |
|
title={Incorporating Pre-trained Transformer Models into TextCNN for Sentiment Analysis on Software Engineering Texts}, |
|
author={Sun, Kexin and Shi, Xiaobo and Gao, Hui and Kuang, Hongyu and Ma, Xiaoxing and Rong, Guoping and Shao, Dong and Zhao, Zheng and Zhang, He}, |
|
booktitle={Proceedings of the 13th Asia-Pacific Symposium on Internetware}, |
|
pages={127--136}, |
|
year={2022} |
|
} |
|
``` |