---
base_model: bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT
library_name: transformers
language:
- en
tags:
- code
- codeqwen
- chat
- qwen
- qwen-coder
license: gpl-3.0
datasets:
- bunyaminergen/Stable-Code-Python-SFT
pipeline_tag: text-generation
license_link: https://huggingface.co/bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled/blob/main/LICENSE
---

# Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled

The Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled model has been distilled from the Qwen2.5-Coder-1.5B-Instruct-SFT model
down to 1B parameters using a token-based knowledge distillation method.

---

### TableofContents

- [Usage](#usage)
- [Dataset](#dataset)
- [Training](#training)
- [License](#licence)
- [Links](#links)
- [Team](#team)
- [Contact](#contact)
- [Citation](#citation)

---

### Usage

#### Hugging Face

```python
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer

repo = "bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled"
tokenize  = AutoTokenizer.from_pretrained(repo, padding_side="left")
model  = AutoModelForCausalLM.from_pretrained(
          repo,
          device_map="auto",
          torch_dtype="auto",
      ).eval()

system = "You are a senior Python developer."
user   = "Give me a Python implementation of bubble sort."

text = f"System: {system}\nUser: {user}\nAssistant:"
inputs = tokenize(text, return_tensors="pt").to(model.device)

with torch.no_grad():
    out_ids = model.generate(**inputs, max_new_tokens=512)
print(tokenize.decode(out_ids[0], skip_special_tokens=True))
```

---

### Dataset

- [bunyaminergen/Stable-Code-Python-SFT](https://huggingface.co/datasets/bunyaminergen/Stable-Code-Python-SFT)

---

### Training

#### Hyperparameters

| Hyperparameter                | Value                                           |
|-------------------------------|-------------------------------------------------|
| Base Model                    | `bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT` |
| Knowledge Distillation Method | Token based                                     |
| Task Type                     | `CAUSAL_LM`                                     |
| Number of Epochs              | `11`                                            |
| Batch Size                    | `12`                                            |
| Gradient Accumulation Steps   | `2`                                             |
| Effective Batch Size          | `24` (12 × 2)                                   |
| Learning Rate                 | `5e-5`                                          |
| Optimizer                     | `AdamW`                                         |
| Precision                     | `BF16 Mixed Precision`                          |
| Evaluation Strategy           | `epoch`                                         |
| Max Sequence Length           | `256 tokens`                                    |
| Logging Steps                 | every `epoch` steps                             |
| Save Checkpoint Steps         | every `10000` steps                             |
| Experiment Tracking           | `MLflow` (local)                                |
| Experiment Name               | `StudentKnowledgeDistillation`                  |
| MLflow Run Name               | `StudentKD`                                     |

#### Knowledge Distillation Configuration

| Parameter           | Value       |
|---------------------|-------------|
| Distillation Weight | `0.3`       |
| Temperature         | `0.5`       |
| Loss Reduction      | `batchmean` |

#### Dataset

- **Train/Test Split:** `90%/10%`
- **Random Seed:** `42`
- **Train Batched:** `True`
- **Eval Batched:** `True`

#### Tokenizer Configuration

- **Truncation:** Enabled (`max_length=256`)
- **Masked Language Modeling (MLM):** `False`

#### Speeds, Sizes, Times

- **Total Training Time:** ~7 hours
- **Checkpoint Frequency:** every `10000` steps
- **Checkpoint Steps:**
    - `checkpoint-10000`
    - `checkpoint-13200` *(final checkpoint)*

#### Compute Infrastructure

**Hardware:**

- GPU: **1 × NVIDIA L40S (48 GB VRAM)**
- RAM: **94 GB**
- CPU: **16 vCPU**

**Software:**

- OS: **Ubuntu 22.04**
- Frameworks: **PyTorch 2.4.0**
- CUDA Version: **12.4.1**

---

### Licence

- [LICENSE](LICENSE)

---

### Links

- [Github](https://github.com/bunyaminergen/)
- [Website](https://bunyaminergen.com)
- [Linkedin](https://www.linkedin.com/in/bunyaminergen)

---

### Team

- [Bunyamin Ergen](https://www.linkedin.com/in/bunyaminergen)

---

### Contact

- [Mail](mailto:info@bunyaminergen.com)

---

### Citation

```bibtex
@software{       Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled,
  author       = {Bunyamin Ergen},
  title        = {{Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled}},
  year         = {2025},
  month        = {04},
}
```

---