--- model_name: "Qwen3-0.6B-en-law-qa" finetuned_by: "Ahsan Ahmed Khan (Ontario)" model_type: "Fine-tuned Causal Language Model for Legal Q&A" base_model: "Qwen/Qwen3-0.6B" language: "en" finetuning_method: "LoRA (Low-Rank Adaptation)" license: "apache-2.0" datasets: - "haistudy/en_law_qa" tags: - "legal" - "question-answering" - "law" - "instruction-tuned" --- # Model Card for Qwen3-0.6B-en-law-qa ## Model Details - **Developed by:** Ontario (Ahsan Ahmed KHan) - **Base Model:** [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) - **Dataset:** [haistudy/en_law_qa](https://huggingface.co/datasets/haistudy/en_law_qa) - **Language:** English - **License:** Apache 2.0 - **Fine-tuning Approach:** Parameter-Efficient Fine-Tuning (LoRA) ## Model Description Fine-tuned version of Qwen3-0.6B optimized for legal question answering. Trained on 5,560 legal QA pairs covering: - Contract law - Intellectual property - Criminal law - Family law - Environmental law ## Intended Uses ✅ Legal research assistance ✅ Legal education ✅ Explaining legal concepts ❌ Actual legal advice ❌ Handling sensitive personal legal matters ## Training Configuration training_parameters: epochs: 73 (partial training) batch_size: 16 gradient_accumulation_steps: 16 learning_rate: 2e-4 optimizer: "paged_adamw_8bit" quantization: load_in_4bit: true bnb_4bit_quant_type: "nf4" bnb_4bit_compute_dtype: "bfloat16" lora_config: r: 8 lora_alpha: 32 target_modules: - "q_proj" - "k_proj" - "v_proj" - "o_proj" lora_dropout: 0.05 bias: "none" Usage Example from transformers import AutoTokenizer, AutoModelForCausalLM from peft import PeftModel model_name = "Qwen/Qwen3-0.6B" tokenizer = AutoTokenizer.from_pretrained(model_name) base_model = AutoModelForCausalLM.from_pretrained(model_name, torch_dtype=torch.float16, device_map="auto") model = PeftModel.from_pretrained(base_model, "your-username/Qwen3-0.6B-en-law-qa") # Create prompt question = "What are the key elements of a valid contract?" messages = [ {"role": "user", "content": question} ] text = tokenizer.apply_chat_template( messages, tokenize=False, add_generation_prompt=True ) # Generate response inputs = tokenizer(text, return_tensors="pt").to(model.device) outputs = model.generate(**inputs, max_new_tokens=256) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) Training Data yaml dataset_stats: samples: 5560 format: | <|im_start|>user {Question}<|im_end|> <|im_start|>assistant {Answer}<|im_end|> data_sources: - Contract law - Intellectual property - Criminal law - Family law - Environmental law Limitations Limited to knowledge in training data (2023 cutoff) May generate plausible but incorrect information Not a substitute for professional legal advice English-only capability Environmental Impact Hardware: 1 × NVIDIA T4 GPU (Google Colab) CO2 Emissions: ≈0.8 kg (estimated during partial training) Calculated using Machine Learning Impact calculator Contact For questions or feedback: ahsanahmedkhan@proton.me