Cloudy1225
/

stackoverflow-roberta-base-sentiment

Text Classification

Model card Files Files and versions Community

Cloudy1225 commited on Jun 3, 2023

Commit

32b5747

·

1 Parent(s): b4255f4

Update README.md

Files changed (1) hide show

README.md +59 -0

README.md CHANGED Viewed

@@ -1,3 +1,62 @@
 ---
 license: openrail
 ---

 ---
 license: openrail
 ---
+# StackOverflow-RoBERTa-base for Sentiment Analysis on Software Engineering Texts
+This is a RoBERTa-base model for sentiment analysis on software engineering texts. It is re-finetuned from [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) with [StackOverflow4423](https://arxiv.org/abs/1709.02984) dataset. You can access the demo [here](https://huggingface.co/spaces/Cloudy1225/stackoverflow-sentiment-analysis).
+## Example of Pipeline
+```python
+from transformers import pipeline
+MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
+sentiment_task = pipeline(task="sentiment-analysis", model=MODEL)
+sentiment_task(["Excellent, happy to help!",
+                "This can probably be done using JavaScript.",
+                "Yes, but it's tricky, since datetime parsing in SQL is a pain in the neck."])
+```
+    [{'label': 'positive', 'score': 0.9997847676277161},
+     {'label': 'neutral', 'score': 0.999783456325531},
+     {'label': 'negative', 'score': 0.9996368885040283}]
+## Example of Classification
+```python
+from scipy.special import softmax
+from transformers import AutoTokenizer, AutoModelForSequenceClassification
+def preprocess(text):
+    """Preprocess text (username and link placeholders)"""
+    new_text = []
+    for t in text.split(' '):
+        t = '@user' if t.startswith('@') and len(t) > 1 else t
+        t = 'http' if t.startswith('http') else t
+        new_text.append(t)
+    return ' '.join(new_text).strip()
+MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
+tokenizer = AutoTokenizer.from_pretrained(MODEL)
+model = AutoModelForSequenceClassification.from_pretrained(MODEL)
+text = "Excellent, happy to help!"
+text = preprocess(text)
+encoded_input = tokenizer(text, return_tensors='pt')
+output = model(**encoded_input)
+scores = output[0][0].detach().numpy()
+scores = softmax(scores)
+print("negative", scores[0])
+print("neutral", scores[1])
+print("positive", scores[2])
+```
+    negative 0.00015578205
+    neutral 5.9470447e-05
+    positive 0.99978495