roberta-finetuned-emotion-multilabel-tf
This model is a fine-tuned version of roberta-base on the GoEmotions dataset. It has been trained to perform multi-label text classification to detect one or more of 14 different emotions from a given text.
This model was trained as part of a final project for an AI module, demonstrating the end-to-end process of data analysis, preprocessing, model fine-tuning with TensorFlow/Keras, evaluation, and deployment on the Hugging Face Hub.
It achieves the following results on the evaluation set:
- Macro F1-Score: 0.5123 (or your actual score)
Model description
This is a roberta-base
model fine-tuned for multi-label emotion classification. The model takes a text as input and outputs a probability score for each of the following 14 emotions:
amusement
anger
annoyance
caring
confusion
disappointment
disgust
embarrassment
excitement
fear
gratitude
joy
love
sadness
Since it's a multi-label classification task, the output layer uses a Sigmoid activation function, and the model is trained with Binary Cross-Entropy loss.
Intended uses & limitations
How to use
You can use this model with the text-classification
pipeline. Since it's a multi-label model, it's recommended to pass the top_k=None
argument to see the scores for all labels.
from transformers import pipeline
# Replace "your-username/fp-ai-modul-6" with your actual model repo
classifier = pipeline("text-classification", model="your-username/fp-ai-modul-6", top_k=None)
text = "I can't believe I won the lottery! This is the best day of my life!"
predictions = classifier(text)
# Apply a threshold to filter relevant emotions
threshold = 0.35 # This threshold was tuned on the validation set
for pred in predictions[0]:
if pred['score'] > threshold:
print(f"Label: {pred['label']}, Score: {pred['score']:.4f}")
# Expected output:
# Label: joy, Score: 0.9876
# Label: excitement, Score: 0.9754
# Label: amusement, Score: 0.4532
Limitations
- The model was trained on the GoEmotions dataset, which primarily consists of English text from Reddit comments. Its performance on other domains (e.g., formal text, poetry, other languages) may be suboptimal.
- The dataset has a significant class imbalance. The model performs better on common emotions like
joy
andamusement
and may struggle with rare emotions likeembarrassment
ordisgust
. - The
roberta-base
architecture is smaller and faster but may be less accurate than larger models likeroberta-large
.
Training and evaluation data
The model was fine-tuned on the GoEmotions dataset, a human-annotated dataset of 58k Reddit comments labeled with 27 emotion categories. For this project, a subset of 14 primary emotions was used.
The data was split into:
- Training set: 37,164 samples
- Validation set: 9,291 samples
Preprocessing steps included lowercasing and tokenization with a max length of 128.
Training procedure
Training hyperparameters
The following hyperparameters were used during training:
- optimizer: {'name': 'AdamW', 'weight_decay': 0.0, 'clipnorm': None, 'global_clipnorm': None, 'clipvalue': None, 'use_ema': False, 'ema_momentum': 0.99, 'ema_overwrite_frequency': None, 'jit_compile': True, 'is_legacy_optimizer': False, 'learning_rate': 2e-05, 'beta_1': 0.9, 'beta_2': 0.999, 'epsilon': 1e-07, 'amsgrad': False}
- training_precision: float32
- epochs: 4
- batch_size: 32
- loss_function: BinaryCrossentropy (from_logits=True)
An EarlyStopping
callback was used to monitor val_loss
with a patience of 2, restoring the best weights at the end of training.
Training results
The final model achieved a Macro F1-Score of 0.5123 on the validation set after tuning the prediction threshold to 0.35.
Framework versions
- Transformers 4.41.2
- TensorFlow 2.16.1
- Datasets 2.19.0
- Tokenizers 0.19.1
- Downloads last month
- 3
Model tree for athallabf/fp-ai-modul-6
Base model
FacebookAI/roberta-baseEvaluation results
- Macro F1-Score on GoEmotionsvalidation set self-reported0.512