Ngit commited on
Commit
05d7726
·
1 Parent(s): 1c1b754

Create README.md

Browse files
Files changed (1) hide show
  1. README.md +123 -0
README.md ADDED
@@ -0,0 +1,123 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ ---
2
+ datasets:
3
+ - go_emotions
4
+ language:
5
+ - en
6
+ library_name: transformers
7
+ model-index:
8
+ - name: text-classification-goemotions
9
+ results:
10
+ - task:
11
+ name: Text Classification
12
+ type: text-classification
13
+ dataset:
14
+ name: go_emotions
15
+ type: multilabel_classification
16
+ config: simplified
17
+ split: test
18
+ args: simplified
19
+ metrics:
20
+ - name: F1
21
+ type: f1
22
+ value: 0.487
23
+ ---
24
+
25
+ # Text Classification GoEmotions
26
+
27
+ This model is a onnx quantized fined-tuned version of [nreimers/MiniLMv2-L6-H384-distilled-from-RoBERTa-Large](https://huggingface.co/nreimers/MiniLMv2-L6-H384-distilled-from-RoBERTa-Large) on the on the [go_emotions](https://huggingface.co/datasets/go_emotions) dataset using [tasinho/text-classification-goemotions](https://huggingface.co/tasinhoque/text-classification-goemotions) as teacher model.
28
+
29
+ # Load the Model
30
+
31
+ ```py
32
+ import os
33
+ import numpy as np
34
+ import json
35
+
36
+ from tokenizers import Tokenizer
37
+ from onnxruntime import InferenceSession
38
+
39
+
40
+ # !git clone https://huggingface.co/Ngit/MiniLMv2-L6-H384-goemotions-v2-onnx
41
+
42
+ model_name = "Ngit/MiniLMv2-L6-H384-goemotions-v2-onnx"
43
+ tokenizer = Tokenizer.from_pretrained(model_name)
44
+ tokenizer.enable_padding(
45
+ pad_token="<pad>",
46
+ pad_id=1,
47
+ )
48
+ tokenizer.enable_truncation(max_length=256)
49
+ batch_size = 16
50
+
51
+ texts = ["I am angry",]
52
+ outputs = []
53
+ model = InferenceSession("MiniLMv2-L6-H384-goemotions-v2-onnx\model_optimized_quantized.onnx", providers=['CUDAExecutionProvider'])
54
+
55
+ with open(os.path.join("MiniLMv2-L6-H384-goemotions-v2-onnx", "config.json"), "r") as f:
56
+ config = json.load(f)
57
+
58
+ output_names = [output.name for output in model.get_outputs()]
59
+ input_names = [input.name for input in model.get_inputs()]
60
+
61
+ for subtexts in np.array_split(np.array(texts), len(texts) // batch_size + 1):
62
+ encodings = tokenizer.encode_batch(list(subtexts))
63
+ inputs = {
64
+ "input_ids": np.vstack(
65
+ [encoding.ids for encoding in encodings], dtype=np.int64
66
+ ),
67
+ "attention_mask": np.vstack(
68
+ [encoding.attention_mask for encoding in encodings], dtype=np.int64
69
+ ),
70
+ "token_type_ids": np.vstack(
71
+ [encoding.type_ids for encoding in encodings], dtype=np.int64
72
+ ),
73
+ }
74
+
75
+ for input_name in input_names:
76
+ if input_name not in inputs:
77
+ raise ValueError(f"Input name {input_name} not found in inputs")
78
+
79
+ inputs = {input_name: inputs[input_name] for input_name in input_names}
80
+ output = np.squeeze(
81
+ np.stack(
82
+ model.run(output_names=output_names, input_feed=inputs)
83
+ ),
84
+ axis=0,
85
+ )
86
+ outputs.append(output)
87
+
88
+ outputs = np.concatenate(outputs, axis=0)
89
+ scores = 1 / (1 + np.exp(-outputs))
90
+ results = []
91
+ for item in scores:
92
+ labels = []
93
+ scores = []
94
+ for idx, s in enumerate(item):
95
+ labels.append(config["id2label"][str(idx)])
96
+ scores.append(float(s))
97
+ results.append({"labels": labels, "scores": scores})
98
+
99
+ results
100
+ ```
101
+ # Training hyperparameters
102
+
103
+ The following hyperparameters were used during training:
104
+ - learning_rate: 6e-05
105
+ - train_batch_size: 64
106
+ - eval_batch_size: 64
107
+ - seed: 42
108
+ - optimizer: Adam with betas=(0.9,0.999) and epsilon=1e-08
109
+ - lr_scheduler_type: linear
110
+ - num_epochs: 40
111
+
112
+
113
+ # Metrics (comparison with teacher model)
114
+
115
+ | Teacher (params) | Student (params) | Set | Score (teacher) | Score (student) |
116
+ |--------------------|-------------|----------|--------| --------|
117
+ | tasinhoque/text-classification-goemotions (355M) | MiniLMv2-L6-H384-goemotions-v2 | Validation | 0.514252 |0.484898 |
118
+ | tasinhoque/text-classification-goemotions (33M) | MiniLMv2-L6-H384-goemotions-v2 (original model) | Test | 0.501937 | 0.486890 |
119
+
120
+ # Training Code, Evaluation & Deployment
121
+
122
+ Check
123
+