--- base_model: - Qwen/Qwen3-0.6B library_name: peft model_name: Qwen-3-0.6B-it-Medical-LoRA tags: - generated_from_trainer - trl - sft - unsloth licence: license license: mit language: - vi datasets: - tmnam20/ViMedAQA --- # Model Card for Qwen-3-0.6B-it-Medical-LoRA This model is a fine-tuned version of [unsloth/qwen3-0.6b-unsloth-bnb-4bit](https://huggingface.co/unsloth/qwen3-0.6b-unsloth-bnb-4bit). It has been trained using [TRL](https://github.com/huggingface/trl). ## Training procedure This model was trained with SFT. ## Usage ### HuggingFace Authentication ```python import os from huggingface_hub import login # Set the Hugging Face API token os.environ["HUGGINGFACEHUB_API_TOKEN"] = "" # # Initialize API login(os.environ.get("HUGGINGFACEHUB_API_TOKEN")) ``` ### Inference ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer from transformers import TextStreamer from peft import PeftModel device = "cuda" if torch.cuda.is_available() else "cpu" # Define model and LoRA adapter paths base_model_name = "Qwen/Qwen3-0.6B" lora_adapter_name = "danhtran2mind/Qwen-3-0.6B-Instruct-Vi-Medical-LoRA" # Load tokenizer tokenizer = AutoTokenizer.from_pretrained(base_model_name) # Load base model with optimized settings model = AutoModelForCausalLM.from_pretrained( base_model_name, torch_dtype=torch.float16, # Use FP16 for efficiency device_map=device, trust_remote_code=True ) # Apply LoRA adapter model = PeftModel.from_pretrained(model, lora_adapter_name) # Set model to evaluation mode model.eval() prompt = ("Khi nghi ngờ bị loét dạ dày tá tràng nên đến khoa nào " "tại bệnh viện để thăm khám?") # Set random seed for reproducibility seed = 42 torch.manual_seed(seed) if torch.cuda.is_available(): torch.cuda.manual_seed(seed) torch.cuda.manual_seed_all(seed) messages = [ {"role" : "user", "content" : prompt} ] text = tokenizer.apply_chat_template( messages, tokenize = False, add_generation_prompt = True, # Must add for generation enable_thinking = False, # Disable thinking ) _ = model.generate( **tokenizer(text, return_tensors = "pt").to(device), max_new_tokens = 2048, # Increase for longer outputs! temperature = 0.7, top_p = 0.9, top_k = 20, # For non thinking streamer = TextStreamer(tokenizer, skip_prompt = True, skip_special_tokens=True), ) ``` ```markdown Khi nghi ngờ bị loét dạ dày tá tràng, bạn nên đến phòng khám chuyên khoa Giai đoạn Trung tâm Nghi ngờ Loét Dạ dày để được tư vấn và đánh giá chẩn đoán chính xác. ``` ### Framework versions - PEFT 0.15.2 - TRL: 0.19.1 - Transformers: 4.51.3 - Pytorch: 2.7.0 - Datasets: 3.6.0 - Tokenizers: 0.21.1 ## Citations Cite TRL as: ```bibtex @misc{vonwerra2022trl, title = {{TRL: Transformer Reinforcement Learning}}, author = {Leandro von Werra and Younes Belkada and Lewis Tunstall and Edward Beeching and Tristan Thrush and Nathan Lambert and Shengyi Huang and Kashif Rasul and Quentin Gallou{\'e}dec}, year = 2020, journal = {GitHub repository}, publisher = {GitHub}, howpublished = {\url{https://github.com/huggingface/trl}} } ```