---
base_model: Qwen/Qwen3-0.6B
tags:
- ellora
- lora
- quantization
- accuracy-recovery
- distillation
- magpie
- efficiency
- peft
- qwen3
library_name: peft
license: apache-2.0
language:
- en
pipeline_tag: text-generation
inference: false
model_type: qwen2
datasets:
- codelion/Qwen3-0.6B-magpie
---

# codelion/Qwen3-0.6B-accuracy-recovery-lora

## 🎯 Accuracy Recovery LoRA Adapter

This LoRA adapter helps recover accuracy when using INT4 quantized versions of Qwen/Qwen3-0.6B.
It was trained using self-distillation with Magpie-generated data.

## 📊 Performance Metrics

- **Base Model**: Qwen/Qwen3-0.6B
- **Quantization**: INT4 with NF4
- **LoRA Rank**: 64
- **LoRA Alpha**: 128
- **Training Samples**: 610
- **Target Performance Gap**: <5% perplexity increase

## 🔧 Usage

```python
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel

# Load base model with quantization
quantization_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_compute_dtype=torch.float16,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4"
)

model = AutoModelForCausalLM.from_pretrained(
    "Qwen/Qwen3-0.6B",
    quantization_config=quantization_config,
    device_map="auto"
)

# Load LoRA adapter
model = PeftModel.from_pretrained(model, "codelion/Qwen3-0.6B-accuracy-recovery-lora")

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")

# Use the model
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
```

## 🧪 Training Details

- **Method**: Self-distillation using Magpie data generation
- **Framework**: PEFT + LoRA
- **Loss Function**: Combined KL divergence + MSE loss
- **Temperature**: 1.0
- **Alpha (distillation weight)**: 0.01

## 📈 Expected Benefits

- ✅ Maintains accuracy close to FP16 baseline
- ✅ ~75% reduction in memory usage
- ✅ 2-3x faster inference than FP16
- ✅ Easy to integrate with existing workflows

## 🏷️ Related

- **Dataset**: [codelion/Qwen3-0.6B-magpie](https://huggingface.co/datasets/codelion/Qwen3-0.6B-magpie)
- **Base Model**: [Qwen/Qwen3-0.6B](https://huggingface.co/Qwen/Qwen3-0.6B)
- **Framework**: [PEFT](https://github.com/huggingface/peft)

---

*This adapter is part of the [Ellora project](https://github.com/codelion/ellora) - standardized recipes for enhancing LLM capabilities.*