Pankaj8922/better-opus-mt-en-hi
Fine-tuned MarianMT model for English β Hindi translation. This model is trained on AI4Bharat's Samanantar dataset, which contains over 10 million high-quality parallel sentences.
π Model Details
- Base model:
Helsinki-NLP/opus-mt-en-hi
- Fine-tuned on:
ai4bharat/samanantar
EnglishβHindi subset - Total params: ~77M (MarianMT)
- Framework: Hugging Face Transformers
π Performance (BLEU / chrF on 500 samples from Namratap/En-Hindi)
Domain | Base BLEU | Fine-tuned BLEU | Base chrF | Fine-tuned chrF |
---|---|---|---|---|
Healthcare | 15.54 | 27.95 | 38.06 | 54.09 |
Gen News | 14.11 | 26.31 | 39.07 | 52.98 |
Culture/Tourism | 12.76 | 18.49 | 35.07 | 41.32 |
Education | 20.28 | 28.82 | 43.84 | 49.68 |
β
BLEU improvements of +8 to +13 points across domains
β
chrF boosts up to +16 points, reflecting better fluency and coverage
π§ Use Cases
- Book and news translation (Hindi)
- Offline/secure translation pipelines
- Domain-adapted fine-tuning
π Files Included
pytorch_model.bin
β fine-tuned model weightsconfig.json
β model architecturetokenizer_config.json
,vocab.json
,source.spm
,target.spm
β tokenizergeneration_config.json
β default decoding setup
βοΈ License
Apache 2.0 (Same as original model and Samanantar dataset)
- Downloads last month
- 15
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
π
Ask for provider support
Model tree for AI4INDIANS/better-opus-mt-en-hi
Base model
Helsinki-NLP/opus-mt-en-hi