Ellora
Collection
Ellora: Enhancing LLMs with LoRA - Standardized Recipes for Capability Enhancement
β’
8 items
β’
Updated
β’
1
This LoRA adapter helps recover accuracy when using INT4 quantized versions of Qwen/Qwen3-0.6B. It was trained using self-distillation with Magpie-generated data.
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import PeftModel
# Load base model with quantization
quantization_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_compute_dtype=torch.float16,
bnb_4bit_use_double_quant=True,
bnb_4bit_quant_type="nf4"
)
model = AutoModelForCausalLM.from_pretrained(
"Qwen/Qwen3-0.6B",
quantization_config=quantization_config,
device_map="auto"
)
# Load LoRA adapter
model = PeftModel.from_pretrained(model, "codelion/Qwen3-0.6B-accuracy-recovery-lora")
# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("Qwen/Qwen3-0.6B")
# Use the model
inputs = tokenizer("Hello, how are you?", return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=100)
response = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(response)
This adapter is part of the Ellora project - standardized recipes for enhancing LLM capabilities.