---
base_model: Qwen/Qwen3-0.6B-Base
tags:
- knowledge-distillation
- full-fine-tuning
- mmlu
- qwen
- safetensors
library_name: transformers
license: apache-2.0
datasets:
- cais/mmlu
---

# Distilled Qwen Model - Full Fine-tuning

This model was created through knowledge distillation from **Qwen/Qwen3-8B-Base** to **Qwen/Qwen3-0.6B-Base** using full parameter fine-tuning.

## Model Details

- **Base Model**: Qwen/Qwen3-0.6B-Base
- **Teacher Model**: Qwen/Qwen3-8B-Base
- **Method**: Knowledge Distillation with Full Fine-tuning
- **Dataset**: MMLU (Massive Multitask Language Understanding)
- **Distillation Alpha**: 0.7
- **Temperature**: 4.0
- **Total Parameters**: ~600M (all parameters updated)
- **Format**: Safetensors (safer and more efficient than PyTorch format)

## Training Details

- **Training Samples**: 285
- **Epochs**: 30
- **Batch Size**: 2
- **Learning Rate**: 5e-05
- **Final Eval Loss**: N/A

## Usage

```python
from transformers import AutoTokenizer, AutoModelForCausalLM

# Load the distilled model directly
tokenizer = AutoTokenizer.from_pretrained("CarlOwOs/distilled-qwen3-0.6b-full-mmlu")
model = AutoModelForCausalLM.from_pretrained("CarlOwOs/distilled-qwen3-0.6b-full-mmlu")

# Generate text
inputs = tokenizer("Question: What is the capital of France?\nA. London\nB. Berlin\nC. Paris\nD. Madrid\nAnswer:", return_tensors="pt")
outputs = model.generate(**inputs, max_length=100)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))
```

## Technical Notes

- Model weights are stored in **safetensors format** for improved security and loading speed
- Compatible with all Hugging Face transformers library versions that support safetensors
- Memory efficient loading and faster inference compared to traditional PyTorch format

## Evaluation

This model should be evaluated on MCQA tasks using log-likelihood comparison, as implemented in the evaluation framework.