Pankaj8922/better-opus-mt-en-hi

Fine-tuned MarianMT model for English β†’ Hindi translation. This model is trained on AI4Bharat's Samanantar dataset, which contains over 10 million high-quality parallel sentences.

πŸ” Model Details

  • Base model: Helsinki-NLP/opus-mt-en-hi
  • Fine-tuned on: ai4bharat/samanantar English–Hindi subset
  • Total params: ~77M (MarianMT)
  • Framework: Hugging Face Transformers

πŸ“Š Performance (BLEU / chrF on 500 samples from Namratap/En-Hindi)

Domain Base BLEU Fine-tuned BLEU Base chrF Fine-tuned chrF
Healthcare 15.54 27.95 38.06 54.09
Gen News 14.11 26.31 39.07 52.98
Culture/Tourism 12.76 18.49 35.07 41.32
Education 20.28 28.82 43.84 49.68

βœ… BLEU improvements of +8 to +13 points across domains
βœ… chrF boosts up to +16 points, reflecting better fluency and coverage

🧠 Use Cases

  • Book and news translation (Hindi)
  • Offline/secure translation pipelines
  • Domain-adapted fine-tuning

πŸ“ Files Included

  • pytorch_model.bin β€” fine-tuned model weights
  • config.json β€” model architecture
  • tokenizer_config.json, vocab.json, source.spm, target.spm β€” tokenizer
  • generation_config.json β€” default decoding setup

βš–οΈ License

Apache 2.0 (Same as original model and Samanantar dataset)

Downloads last month
15
Safetensors
Model size
77M params
Tensor type
F32
Β·
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for AI4INDIANS/better-opus-mt-en-hi

Finetuned
(55)
this model

Dataset used to train AI4INDIANS/better-opus-mt-en-hi