poltextlab/xlm-roberta-large-hungarian-sentiment-v2

Model description

This model is based on XLM-RoBERTa Large and fine-tuned for Hungarian sentiment analysis.
It classifies text into three sentiment categories:

0 → Negative
1 → Neutral
2 → Positive

Training data

The model was trained on a mix of original and synthetically generated Hungarian texts.
Synthetic samples were introduced to improve class balance and robustness.

Performance

Overall metrics

Accuracy: 0.8530
Precision: 0.8449
Recall: 0.8530
F1 Score: 0.8469

Classification Report

Label	Precision	Recall	F1-score	Support
0 (Negative)	0.89	0.92	0.91	1866
1 (Neutral)	0.65	0.50	0.56	583
2 (Positive)	0.86	0.92	0.89	1341

Metric	Precision	Recall	F1-score	Support
Accuracy	–	–	0.85	3790
Macro avg	0.80	0.78	0.79	3790
Weighted avg	0.84	0.85	0.85	3790

Confusion Matrix

Usage

from transformers import AutoTokenizer, AutoModelForSequenceClassification, pipeline

model_name = "poltextlab/xlm-roberta-large-hungarian-sentiment-v2"

tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForSequenceClassification.from_pretrained(model_name)

clf = pipeline("text-classification", model=model, tokenizer=tokenizer)

print(clf("Ez egy fantasztikus nap volt!"))  # expected: Positive
print(clf("Ez borzalmas élmény volt."))     # expected: Negative

License

MIT License

Limitations and notes

The neutral class performs weaker compared to the positive and negative classes.
Performance may vary depending on text length, domain, and context.
Due to the use of synthetic data, some linguistic patterns may be over- or underrepresented.

poltextlab
/

xlm-roberta-large-hungarian-sentiment-v2