danhtran2mind
/

Qwen-3-0.6B-Instruct-Vi-Medical-LoRA

Generated from Trainer

Model card Files Files and versions

Qwen-3-0.6B-Instruct-Vi-Medical-LoRA / README.md

danhtran2mind's picture

Update README.md

3ccb5d3 verified about 1 month ago

|

history blame contribute delete

3.28 kB

	---
	base_model:
	- Qwen/Qwen3-0.6B
	library_name: peft
	model_name: Qwen-3-0.6B-it-Medical-LoRA
	tags:
	- generated_from_trainer
	- trl
	- sft
	- unsloth
	licence: license
	license: mit
	language:
	- vi
	datasets:
	- tmnam20/ViMedAQA
	---

	# Model Card for Qwen-3-0.6B-it-Medical-LoRA

	This model is a fine-tuned version of [unsloth/qwen3-0.6b-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen3-0.6b-unsloth-bnb-4bit).
	It has been trained using [TRL](https://github.com/huggingface/trl).

	## Training procedure




	This model was trained with SFT.


	## Usage

	### HuggingFace Authentication
	```python
	import os
	from huggingface_hub import login

	# Set the Hugging Face API token
	os.environ["HUGGINGFACEHUB_API_TOKEN"] = "<your_huggingface_token>"

	# # Initialize API
	login(os.environ.get("HUGGINGFACEHUB_API_TOKEN"))
	```

	### Inference

	```python
	import torch
	from transformers import AutoModelForCausalLM, AutoTokenizer
	from transformers import TextStreamer
	from peft import PeftModel

	device = "cuda" if torch.cuda.is_available() else "cpu"

	# Define model and LoRA adapter paths
	base_model_name = "Qwen/Qwen3-0.6B"
	lora_adapter_name = "danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA"

	# Load tokenizer
	tokenizer = AutoTokenizer.from_pretrained(base_model_name)

	# Load base model with optimized settings
	model = AutoModelForCausalLM.from_pretrained(
	base_model_name,
	torch_dtype=torch.float16, # Use FP16 for efficiency
	device_map=device,
	trust_remote_code=True
	)

	# Apply LoRA adapter
	model = PeftModel.from_pretrained(model, lora_adapter_name)

	# Set model to evaluation mode
	model.eval()

	prompt = ("Khi nghi ngờ bị loét dạ dày tá tràng nên đến khoa nào "
	"tại bệnh viện để thăm khám?")

	# Set random seed for reproducibility
	seed = 42
	torch.manual_seed(seed)
	if torch.cuda.is_available():
	torch.cuda.manual_seed(seed)
	torch.cuda.manual_seed_all(seed)

	messages = [
	{"role" : "user", "content" : prompt}
	]
	text = tokenizer.apply_chat_template(
	messages,
	tokenize = False,
	add_generation_prompt = True, # Must add for generation
	enable_thinking = False, # Disable thinking
	)

	_ = model.generate(
	**tokenizer(text, return_tensors = "pt").to(device),
	max_new_tokens = 2048, # Increase for longer outputs!
	temperature = 0.7, top_p = 0.9, top_k = 20, # For non thinking
	streamer = TextStreamer(tokenizer, skip_prompt = True, skip_special_tokens=True),
	)
	```

	```markdown
	Khi nghi ngờ bị loét dạ dày tá tràng, bạn nên đến phòng khám chuyên khoa Giai đoạn Trung tâm Nghi ngờ Loét Dạ dày để được tư vấn và đánh giá chẩn đoán chính xác.
	```
	### Framework versions

	- PEFT 0.15.2
	- TRL: 0.19.1
	- Transformers: 4.51.3
	- Pytorch: 2.7.0
	- Datasets: 3.6.0
	- Tokenizers: 0.21.1

	## Citations



	Cite TRL as:

	```bibtex
	@misc{vonwerra2022trl,
	title = {{TRL: Transformer Reinforcement Learning}},
	author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec},
	year = 2020,
	journal = {GitHub repository},
	publisher = {GitHub},
	howpublished = {\url{https://github.com/huggingface/trl}}
	}
	```