Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled
The Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled model has been distilled from the Qwen2.5-Coder-1.5B-Instruct-SFT model down to 1B parameters using a token-based knowledge distillation method.
TableofContents
Usage
Hugging Face
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer
repo = "bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled"
tokenize = AutoTokenizer.from_pretrained(repo, padding_side="left")
model = AutoModelForCausalLM.from_pretrained(
repo,
device_map="auto",
torch_dtype="auto",
).eval()
system = "You are a senior Python developer."
user = "Give me a Python implementation of bubble sort."
text = f"System: {system}\nUser: {user}\nAssistant:"
inputs = tokenize(text, return_tensors="pt").to(model.device)
with torch.no_grad():
out_ids = model.generate(**inputs, max_new_tokens=512)
print(tokenize.decode(out_ids[0], skip_special_tokens=True))
Dataset
Training
Hyperparameters
Hyperparameter | Value |
---|---|
Base Model | bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT |
Knowledge Distillation Method | Token based |
Task Type | CAUSAL_LM |
Number of Epochs | 11 |
Batch Size | 12 |
Gradient Accumulation Steps | 2 |
Effective Batch Size | 24 (12 × 2) |
Learning Rate | 5e-5 |
Optimizer | AdamW |
Precision | BF16 Mixed Precision |
Evaluation Strategy | epoch |
Max Sequence Length | 256 tokens |
Logging Steps | every epoch steps |
Save Checkpoint Steps | every 10000 steps |
Experiment Tracking | MLflow (local) |
Experiment Name | StudentKnowledgeDistillation |
MLflow Run Name | StudentKD |
Knowledge Distillation Configuration
Parameter | Value |
---|---|
Distillation Weight | 0.3 |
Temperature | 0.5 |
Loss Reduction | batchmean |
Dataset
- Train/Test Split:
90%/10%
- Random Seed:
42
- Train Batched:
True
- Eval Batched:
True
Tokenizer Configuration
- Truncation: Enabled (
max_length=256
) - Masked Language Modeling (MLM):
False
Speeds, Sizes, Times
- Total Training Time: ~7 hours
- Checkpoint Frequency: every
10000
steps - Checkpoint Steps:
checkpoint-10000
checkpoint-13200
(final checkpoint)
Compute Infrastructure
Hardware:
- GPU: 1 × NVIDIA L40S (48 GB VRAM)
- RAM: 94 GB
- CPU: 16 vCPU
Software:
- OS: Ubuntu 22.04
- Frameworks: PyTorch 2.4.0
- CUDA Version: 12.4.1
Licence
Links
Team
Contact
Citation
@software{ Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled,
author = {Bunyamin Ergen},
title = {{Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled}},
year = {2025},
month = {04},
}
- Downloads last month
- 12
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support
Model tree for bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled
Base model
Qwen/Qwen2.5-1.5B
Finetuned
Qwen/Qwen2.5-Coder-1.5B
Finetuned
Qwen/Qwen2.5-Coder-1.5B-Instruct