poltextlab/xlm-roberta-large-hungarian-sentiment-v2
Model description
This model is based on XLM-RoBERTa Large and fine-tuned for Hungarian sentiment analysis.
It classifies text into three sentiment categories:
0
→ Negative1
→ Neutral2
→ Positive
Training data
The model was trained on a mix of original and synthetically generated Hungarian texts.
Synthetic samples were introduced to improve class balance and robustness.
Performance
Overall metrics
- Accuracy: 0.8530
- Precision: 0.8449
- Recall: 0.8530
- F1 Score: 0.8469
Classification Report
Label | Precision | Recall | F1-score | Support |
---|---|---|---|---|
0 (Negative) | 0.89 | 0.92 | 0.91 | 1866 |
1 (Neutral) | 0.65 | 0.50 | 0.56 | 583 |
2 (Positive) | 0.86 | 0.92 | 0.89 | 1341 |
Metric | Precision | Recall | F1-score | Support |
---|---|---|---|---|
Accuracy | – | – | 0.85 | 3790 |
Macro avg | 0.80 | 0.78 | 0.79 | 3790 |
Weighted avg | 0.84 | 0.85 | 0.85 | 3790 |
Confusion Matrix
Usage
from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline
model_name = "poltextlab/xlm-roberta-large-hungarian-sentiment-v2"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)
clf = pipeline("text-classification", model=model, tokenizer=tokenizer)
print(clf("Ez egy fantasztikus nap volt!")) # expected: Positive
print(clf("Ez borzalmas élmény volt.")) # expected: Negative
License
MIT License
Limitations and notes
The neutral class performs weaker compared to the positive and negative classes.
Performance may vary depending on text length, domain, and context.
Due to the use of synthetic data, some linguistic patterns may be over- or underrepresented.
- Downloads last month
- 47
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Evaluation results
- Accuracy on Hungarian Sentiment (original + synthetic)self-reported0.853
- Precision on Hungarian Sentiment (original + synthetic)self-reported0.845
- Recall on Hungarian Sentiment (original + synthetic)self-reported0.853
- F1 on Hungarian Sentiment (original + synthetic)self-reported0.847