--- base_model: bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT library_name: transformers language: - en tags: - code - codeqwen - chat - qwen - qwen-coder license: gpl-3.0 datasets: - bunyaminergen/Stable-Code-Python-SFT pipeline_tag: text-generation license_link: https://huggingface.co/bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled/blob/main/LICENSE --- # Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled The Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled model has been distilled from the Qwen2.5-Coder-1.5B-Instruct-SFT model down to 1B parameters using a token-based knowledge distillation method. --- ### TableofContents - [Usage](#usage) - [Dataset](#dataset) - [Training](#training) - [License](#licence) - [Links](#links) - [Team](#team) - [Contact](#contact) - [Citation](#citation) --- ### Usage #### Hugging Face ```python import torch from transformers import AutoModelForCausalLM, AutoTokenizer repo = "bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled" tokenize = AutoTokenizer.from_pretrained(repo, padding_side="left") model = AutoModelForCausalLM.from_pretrained( repo, device_map="auto", torch_dtype="auto", ).eval() system = "You are a senior Python developer." user = "Give me a Python implementation of bubble sort." text = f"System: {system}\nUser: {user}\nAssistant:" inputs = tokenize(text, return_tensors="pt").to(model.device) with torch.no_grad(): out_ids = model.generate(**inputs, max_new_tokens=512) print(tokenize.decode(out_ids[0], skip_special_tokens=True)) ``` --- ### Dataset - [bunyaminergen/Stable-Code-Python-SFT](https://huggingface.co/datasets/bunyaminergen/Stable-Code-Python-SFT) --- ### Training #### Hyperparameters | Hyperparameter | Value | |-------------------------------|-------------------------------------------------| | Base Model | `bunyaminergen/Qwen2.5-Coder-1.5B-Instruct-SFT` | | Knowledge Distillation Method | Token based | | Task Type | `CAUSAL_LM` | | Number of Epochs | `11` | | Batch Size | `12` | | Gradient Accumulation Steps | `2` | | Effective Batch Size | `24` (12 × 2) | | Learning Rate | `5e-5` | | Optimizer | `AdamW` | | Precision | `BF16 Mixed Precision` | | Evaluation Strategy | `epoch` | | Max Sequence Length | `256 tokens` | | Logging Steps | every `epoch` steps | | Save Checkpoint Steps | every `10000` steps | | Experiment Tracking | `MLflow` (local) | | Experiment Name | `StudentKnowledgeDistillation` | | MLflow Run Name | `StudentKD` | #### Knowledge Distillation Configuration | Parameter | Value | |---------------------|-------------| | Distillation Weight | `0.3` | | Temperature | `0.5` | | Loss Reduction | `batchmean` | #### Dataset - **Train/Test Split:** `90%/10%` - **Random Seed:** `42` - **Train Batched:** `True` - **Eval Batched:** `True` #### Tokenizer Configuration - **Truncation:** Enabled (`max_length=256`) - **Masked Language Modeling (MLM):** `False` #### Speeds, Sizes, Times - **Total Training Time:** ~7 hours - **Checkpoint Frequency:** every `10000` steps - **Checkpoint Steps:** - `checkpoint-10000` - `checkpoint-13200` *(final checkpoint)* #### Compute Infrastructure **Hardware:** - GPU: **1 × NVIDIA L40S (48 GB VRAM)** - RAM: **94 GB** - CPU: **16 vCPU** **Software:** - OS: **Ubuntu 22.04** - Frameworks: **PyTorch 2.4.0** - CUDA Version: **12.4.1** --- ### Licence - [LICENSE](LICENSE) --- ### Links - [Github](https://github.com/bunyaminergen/) - [Website](https://bunyaminergen.com) - [Linkedin](https://www.linkedin.com/in/bunyaminergen) --- ### Team - [Bunyamin Ergen](https://www.linkedin.com/in/bunyaminergen) --- ### Contact - [Mail](mailto:info@bunyaminergen.com) --- ### Citation ```bibtex @software{ Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled, author = {Bunyamin Ergen}, title = {{Qwen2.5-Coder-1.5B-Instruct-SFT-Distilled}}, year = {2025}, month = {04}, } ``` ---