---
base_model:
- Qwen/Qwen3-0.6B
library_name: peft
model_name: Qwen-3-0.6B-it-Medical-LoRA
tags:
- generated_from_trainer
- trl
- sft
- unsloth
licence: license
license: mit
language:
- vi
datasets:
- tmnam20/ViMedAQA
---

# Model Card for Qwen-3-0.6B-it-Medical-LoRA

This model is a fine-tuned version of [unsloth/qwen3-0.6b-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen3-0.6b-unsloth-bnb-4bit).
It has been trained using [TRL](https://github.com/huggingface/trl).

## Training procedure

 
This model was trained with SFT.


## Usage

### HuggingFace Authentication
```python
import os
from huggingface_hub import login

# Set the Hugging Face API token
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "<your_huggingface_token>"

# # Initialize API
login(os.environ.get("HUGGINGFACEHUB_API_TOKEN"))
```

### Inference

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import TextStreamer
from peft import PeftModel

device = "cuda" if torch.cuda.is_available() else "cpu"

# Define model and LoRA adapter paths
base_model_name = "Qwen/Qwen3-0.6B"
lora_adapter_name = "danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA"

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained(base_model_name)

# Load base model with optimized settings
model = AutoModelForCausalLM.from_pretrained(
    base_model_name,
    torch_dtype=torch.float16,  # Use FP16 for efficiency
    device_map=device,
    trust_remote_code=True
)

# Apply LoRA adapter
model = PeftModel.from_pretrained(model, lora_adapter_name)

# Set model to evaluation mode
model.eval()

prompt = ("Khi nghi ngờ bị loét dạ dày tá tràng nên đến khoa nào "
            "tại bệnh viện để thăm khám?")

# Set random seed for reproducibility
seed = 42
torch.manual_seed(seed)
if torch.cuda.is_available():
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)

messages = [
    {"role" : "user", "content" : prompt}
]
text = tokenizer.apply_chat_template(
    messages,
    tokenize = False,
    add_generation_prompt = True, # Must add for generation
    enable_thinking = False, # Disable thinking
)

_ = model.generate(
    **tokenizer(text, return_tensors = "pt").to(device),
    max_new_tokens = 2048, # Increase for longer outputs!
    temperature = 0.7, top_p = 0.9, top_k = 20, # For non thinking
    streamer = TextStreamer(tokenizer, skip_prompt = True, skip_special_tokens=True),
)
```

```markdown
Khi nghi ngờ bị loét dạ dày tá tràng, bạn nên đến phòng khám chuyên khoa Giai đoạn Trung tâm Nghi ngờ Loét Dạ dày để được tư vấn và đánh giá chẩn đoán chính xác.
```
### Framework versions

- PEFT 0.15.2
- TRL: 0.19.1
- Transformers: 4.51.3
- Pytorch: 2.7.0
- Datasets: 3.6.0
- Tokenizers: 0.21.1

## Citations


Cite TRL as:
    
```bibtex
@misc{vonwerra2022trl,
	title        = {{TRL: Transformer Reinforcement Learning}},
	author       = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year         = 2020,
	journal      = {GitHub repository},
	publisher    = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
}
```