--- base_model: Qwen/Qwen3-0.6B tags: - ellora - lora - quantization - accuracy-recovery - distillation - magpie - efficiency - peft - qwen3 library_name: peft license: apache-2.0 language: - en pipeline_tag: text-generation inference: false model_type: qwen2 datasets: - codelion/Qwen3-0.6B-magpie --- # codelion/Qwen3-0.6B-accuracy-recovery-lora ## ๐ŸŽฏ Accuracy Recovery LoRA Adapter This LoRA adapter helps recover accuracy when using INT4 quantized versions of Qwen/Qwen3-0.6B. It was trained using self-distillation with Magpie-generated data. ## ๐Ÿ“Š Performance Metrics - **Base Model**: Qwen/Qwen3-0.6B - **Quantization**: INT4 with NF4 - **LoRA Rank**: 64 - **LoRA Alpha**: 128 - **Training Samples**: 610 - **Target Performance Gap**: <5% perplexity increase ## ๐Ÿ”ง Usage ```python from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig from peft import PeftModel # Load base model with quantization quantization_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_compute_dtype=torch.float16, bnb_4bit_use_double_quant=True, bnb_4bit_quant_type="nf4" ) model = AutoModelForCausalLM.from_pretrained( "Qwen/Qwen3-0.6B", quantization_config=quantization_config, device_map="auto" ) # Load LoRA adapter model = PeftModel.from_pretrained(model, "codelion/Qwen3-0.6B-accuracy-recovery-lora") # Load tokenizer tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B") # Use the model inputs = tokenizer("Hello, how are you?", return_tensors="pt") outputs = model.generate(**inputs, max_new_tokens=100) response = tokenizer.decode(outputs[0], skip_special_tokens=True) print(response) ``` ## ๐Ÿงช Training Details - **Method**: Self-distillation using Magpie data generation - **Framework**: PEFT + LoRA - **Loss Function**: Combined KL divergence + MSE loss - **Temperature**: 1.0 - **Alpha (distillation weight)**: 0.01 ## ๐Ÿ“ˆ Expected Benefits - โœ… Maintains accuracy close to FP16 baseline - โœ… ~75% reduction in memory usage - โœ… 2-3x faster inference than FP16 - โœ… Easy to integrate with existing workflows ## ๐Ÿท๏ธ Related - **Dataset**: [codelion/Qwen3-0.6B-magpie](https://huggingface.co/datasets/codelion/Qwen3-0.6B-magpie) - **Base Model**: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B) - **Framework**: [PEFT](https://github.com/huggingface/peft) --- *This adapter is part of the [Ellora project](https://github.com/codelion/ellora) - standardized recipes for enhancing LLM capabilities.*