metadata
license: openrail
StackOverflow-RoBERTa-base for Sentiment Analysis on Software Engineering Texts
This is a RoBERTa-base model for sentiment analysis on software engineering texts. It is re-finetuned from cardiffnlp/twitter-roberta-base-sentiment with StackOverflow4423 dataset. You can access the demo here.
Example of Pipeline
from transformers import pipeline
MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
sentiment_task = pipeline(task="sentiment-analysis", model=MODEL)
sentiment_task(["Excellent, happy to help!",
"This can probably be done using JavaScript.",
"Yes, but it's tricky, since datetime parsing in SQL is a pain in the neck."])
[{'label': 'positive', 'score': 0.9997847676277161},
{'label': 'neutral', 'score': 0.999783456325531},
{'label': 'negative', 'score': 0.9996368885040283}]
Example of Classification
from scipy.special import softmax
from transformers import AutoTokenizer, AutoModelForSequenceClassification
def preprocess(text):
"""Preprocess text (username and link placeholders)"""
new_text = []
for t in text.split(' '):
t = '@user' if t.startswith('@') and len(t) > 1 else t
t = 'http' if t.startswith('http') else t
new_text.append(t)
return ' '.join(new_text).strip()
MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
tokenizer = AutoTokenizer.from_pretrained(MODEL)
model = AutoModelForSequenceClassification.from_pretrained(MODEL)
text = "Excellent, happy to help!"
text = preprocess(text)
encoded_input = tokenizer(text, return_tensors='pt')
output = model(**encoded_input)
scores = output[0][0].detach().numpy()
scores = softmax(scores)
print("negative", scores[0])
print("neutral", scores[1])
print("positive", scores[2])
negative 0.00015578205
neutral 5.9470447e-05
positive 0.99978495