Cloudy1225 commited on
Commit
32b5747
·
1 Parent(s): b4255f4

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +59 -0
README.md CHANGED
@@ -1,3 +1,62 @@
1
  ---
2
  license: openrail
3
  ---
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
  ---
2
  license: openrail
3
  ---
4
+
5
+ # StackOverflow-RoBERTa-base for Sentiment Analysis on Software Engineering Texts
6
+
7
+ This is a RoBERTa-base model for sentiment analysis on software engineering texts. It is re-finetuned from [cardiffnlp/twitter-roberta-base-sentiment](https://huggingface.co/cardiffnlp/twitter-roberta-base-sentiment) with [StackOverflow4423](https://arxiv.org/abs/1709.02984) dataset. You can access the demo [here](https://huggingface.co/spaces/Cloudy1225/stackoverflow-sentiment-analysis).
8
+
9
+ ## Example of Pipeline
10
+
11
+
12
+ ```python
13
+ from transformers import pipeline
14
+
15
+ MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
16
+ sentiment_task = pipeline(task="sentiment-analysis", model=MODEL)
17
+ sentiment_task(["Excellent, happy to help!",
18
+ "This can probably be done using JavaScript.",
19
+ "Yes, but it's tricky, since datetime parsing in SQL is a pain in the neck."])
20
+ ```
21
+
22
+ [{'label': 'positive', 'score': 0.9997847676277161},
23
+ {'label': 'neutral', 'score': 0.999783456325531},
24
+ {'label': 'negative', 'score': 0.9996368885040283}]
25
+
26
+
27
+
28
+ ## Example of Classification
29
+
30
+
31
+ ```python
32
+ from scipy.special import softmax
33
+ from transformers import AutoTokenizer, AutoModelForSequenceClassification
34
+
35
+ def preprocess(text):
36
+ """Preprocess text (username and link placeholders)"""
37
+ new_text = []
38
+ for t in text.split(' '):
39
+ t = '@user' if t.startswith('@') and len(t) > 1 else t
40
+ t = 'http' if t.startswith('http') else t
41
+ new_text.append(t)
42
+ return ' '.join(new_text).strip()
43
+
44
+ MODEL = 'Cloudy1225/stackoverflow-roberta-base-sentiment'
45
+ tokenizer = AutoTokenizer.from_pretrained(MODEL)
46
+ model = AutoModelForSequenceClassification.from_pretrained(MODEL)
47
+
48
+ text = "Excellent, happy to help!"
49
+ text = preprocess(text)
50
+ encoded_input = tokenizer(text, return_tensors='pt')
51
+ output = model(**encoded_input)
52
+ scores = output[0][0].detach().numpy()
53
+ scores = softmax(scores)
54
+ print("negative", scores[0])
55
+ print("neutral", scores[1])
56
+ print("positive", scores[2])
57
+ ```
58
+
59
+ negative 0.00015578205
60
+ neutral 5.9470447e-05
61
+ positive 0.99978495
62
+