Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled

The Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled model has been distilled from the Qwen2.5-Coder-1.5B-Instruct-SFT model down to 1B parameters using a token-based knowledge distillation method.


TableofContents


Usage

Hugging Face

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled"
tokenize  = AutoTokenizer.from_pretrained(repo, padding_side="left")
model  = AutoModelForCausalLM.from_pretrained(
          repo,
          device_map="auto",
          torch_dtype="auto",
      ).eval()

system = "You are a senior Python developer."
user   = "Give me a Python implementation of bubble sort."

text = f"System: {system}\nUser: {user}\nAssistant:"
inputs = tokenize(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    out_ids = model.generate(**inputs, max_new_tokens=512)
print(tokenize.decode(out_ids[0], skip_special_tokens=True))

Dataset


Training

Hyperparameters

Hyperparameter Value
Base Model bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT
Knowledge Distillation Method Token based
Task Type CAUSAL_LM
Number of Epochs 11
Batch Size 12
Gradient Accumulation Steps 2
Effective Batch Size 24 (12 × 2)
Learning Rate 5e-5
Optimizer AdamW
Precision BF16 Mixed Precision
Evaluation Strategy epoch
Max Sequence Length 256 tokens
Logging Steps every epoch steps
Save Checkpoint Steps every 10000 steps
Experiment Tracking MLflow (local)
Experiment Name StudentKnowledgeDistillation
MLflow Run Name StudentKD

Knowledge Distillation Configuration

Parameter Value
Distillation Weight 0.3
Temperature 0.5
Loss Reduction batchmean

Dataset

  • Train/Test Split: 90%/10%
  • Random Seed: 42
  • Train Batched: True
  • Eval Batched: True

Tokenizer Configuration

  • Truncation: Enabled (max_length=256)
  • Masked Language Modeling (MLM): False

Speeds, Sizes, Times

  • Total Training Time: ~7 hours
  • Checkpoint Frequency: every 10000 steps
  • Checkpoint Steps:
    • checkpoint-10000
    • checkpoint-13200 (final checkpoint)

Compute Infrastructure

Hardware:

  • GPU: 1 × NVIDIA L40S (48 GB VRAM)
  • RAM: 94 GB
  • CPU: 16 vCPU

Software:

  • OS: Ubuntu 22.04
  • Frameworks: PyTorch 2.4.0
  • CUDA Version: 12.4.1

Licence


Links


Team


Contact


Citation

@software{       Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled,
  author       = {Bunyamin Ergen},
  title        = {{Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled}},
  year         = {2025},
  month        = {04},
}

Downloads last month
12
Safetensors
Model size
1.02B params
Tensor type
F32
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled

Dataset used to train bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled