|
--- |
|
base_model: |
|
- Qwen/Qwen3-0.6B |
|
library_name: peft |
|
model_name: Qwen-3-0.6B-it-Medical-LoRA |
|
tags: |
|
- generated_from_trainer |
|
- trl |
|
- sft |
|
- unsloth |
|
licence: license |
|
license: mit |
|
language: |
|
- vi |
|
datasets: |
|
- tmnam20/ViMedAQA |
|
--- |
|
|
|
# Model Card for Qwen-3-0.6B-it-Medical-LoRA |
|
|
|
This model is a fine-tuned version of [unsloth/qwen3-0.6b-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen3-0.6b-unsloth-bnb-4bit). |
|
It has been trained using [TRL](https://github.com/huggingface/trl). |
|
|
|
## Training procedure |
|
|
|
|
|
|
|
|
|
This model was trained with SFT. |
|
|
|
|
|
## Usage |
|
|
|
### HuggingFace Authentication |
|
```python |
|
import os |
|
from huggingface_hub import login |
|
|
|
# Set the Hugging Face API token |
|
os.environ["HUGGINGFACEHUB_API_TOKEN"] = "<your_huggingface_token>" |
|
|
|
# # Initialize API |
|
login(os.environ.get("HUGGINGFACEHUB_API_TOKEN")) |
|
``` |
|
|
|
### Inference |
|
|
|
```python |
|
import torch |
|
from transformers import AutoModelForCausalLM, AutoTokenizer |
|
from transformers import TextStreamer |
|
from peft import PeftModel |
|
|
|
device = "cuda" if torch.cuda.is_available() else "cpu" |
|
|
|
# Define model and LoRA adapter paths |
|
base_model_name = "Qwen/Qwen3-0.6B" |
|
lora_adapter_name = "danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA" |
|
|
|
# Load tokenizer |
|
tokenizer = AutoTokenizer.from_pretrained(base_model_name) |
|
|
|
# Load base model with optimized settings |
|
model = AutoModelForCausalLM.from_pretrained( |
|
base_model_name, |
|
torch_dtype=torch.float16, # Use FP16 for efficiency |
|
device_map=device, |
|
trust_remote_code=True |
|
) |
|
|
|
# Apply LoRA adapter |
|
model = PeftModel.from_pretrained(model, lora_adapter_name) |
|
|
|
# Set model to evaluation mode |
|
model.eval() |
|
|
|
prompt = ("Khi nghi ngờ bị loét dạ dày tá tràng nên đến khoa nào " |
|
"tại bệnh viện để thăm khám?") |
|
|
|
# Set random seed for reproducibility |
|
seed = 42 |
|
torch.manual_seed(seed) |
|
if torch.cuda.is_available(): |
|
torch.cuda.manual_seed(seed) |
|
torch.cuda.manual_seed_all(seed) |
|
|
|
messages = [ |
|
{"role" : "user", "content" : prompt} |
|
] |
|
text = tokenizer.apply_chat_template( |
|
messages, |
|
tokenize = False, |
|
add_generation_prompt = True, # Must add for generation |
|
enable_thinking = False, # Disable thinking |
|
) |
|
|
|
_ = model.generate( |
|
**tokenizer(text, return_tensors = "pt").to(device), |
|
max_new_tokens = 2048, # Increase for longer outputs! |
|
temperature = 0.7, top_p = 0.9, top_k = 20, # For non thinking |
|
streamer = TextStreamer(tokenizer, skip_prompt = True, skip_special_tokens=True), |
|
) |
|
``` |
|
|
|
```markdown |
|
Khi nghi ngờ bị loét dạ dày tá tràng, bạn nên đến phòng khám chuyên khoa Giai đoạn Trung tâm Nghi ngờ Loét Dạ dày để được tư vấn và đánh giá chẩn đoán chính xác. |
|
``` |
|
### Framework versions |
|
|
|
- PEFT 0.15.2 |
|
- TRL: 0.19.1 |
|
- Transformers: 4.51.3 |
|
- Pytorch: 2.7.0 |
|
- Datasets: 3.6.0 |
|
- Tokenizers: 0.21.1 |
|
|
|
## Citations |
|
|
|
|
|
|
|
Cite TRL as: |
|
|
|
```bibtex |
|
@misc{vonwerra2022trl, |
|
title = {{TRL: Transformer Reinforcement Learning}}, |
|
author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, |
|
year = 2020, |
|
journal = {GitHub repository}, |
|
publisher = {GitHub}, |
|
howpublished = {\url{https://github.com/huggingface/trl}} |
|
} |
|
``` |