Model Card for 522H0134-NguyenNhatHuy/Sailor-DPO-1.8B-Chat-SFT

This is a supervised fine-tuned version of sail/Sailor-1.8B-Chat using LoRA and PEFT, optimized with Direct Preference Optimization (DPO) on Vietnamese prompt-response pairs with safety annotations.

Model Details

Model Description

Model type: Causal Language Model (Chat-style) fine-tuned with DPO
Language(s): Vietnamese
License: Apache 2.0
Fine-tuned from: sail/Sailor-1.8B-Chat

This model is fine-tuned to improve safe and helpful responses by optimizing for user preferences in Vietnamese open-domain chat. It has been trained on a dataset with approximately 60% unsafe/harmful and 40% safe prompts, filtered with Detoxify score > 0.5.

Uses

Direct Use

Vietnamese open-domain conversational AI with improved safety and preference alignment
Moderation and filtering assistance by better handling unsafe prompts
Instruction-following tasks in Vietnamese

Out-of-Scope Use

High-stakes domains such as medical, legal, or financial advice without human oversight
Non-Vietnamese language tasks

Bias, Risks, and Limitations

The model may still produce biased, inappropriate, or harmful outputs despite safety fine-tuning. It is not guaranteed to detect or avoid all unsafe content.

Recommendations

Use with caution and always apply human review for critical applications. Continue to monitor and improve with feedback.

How to Get Started with the Model

from transformers import AutoTokenizer, AutoModelForCausalLM
from peft import PeftModel

tokenizer = AutoTokenizer.from_pretrained("sail/Sailor-1.8B-Chat", trust_remote_code=True)
base_model = AutoModelForCausalLM.from_pretrained("sail/Sailor-1.8B-Chat", trust_remote_code=True)
model = PeftModel.from_pretrained(base_model, "522H0134-NguyenNhatHuy/Sailor-DPO-1.8B-Chat-SFT", trust_remote_code=True)

prompt = "Bạn hãy giới thiệu về văn hóa Việt Nam."
inputs = tokenizer(prompt, return_tensors="pt")

outputs = model.generate(**inputs, max_new_tokens=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))

522H0134-NguyenNhatHuy
/

Sailor-1.8B-Chat-DPO