--- base_model: Qwen/Qwen3-0.6B-Base tags: - knowledge-distillation - full-fine-tuning - mmlu - qwen - safetensors library_name: transformers license: apache-2.0 datasets: - cais/mmlu --- # Distilled Qwen Model - Full Fine-tuning This model was created through knowledge distillation from **Qwen/Qwen3-8B-Base** to **Qwen/Qwen3-0.6B-Base** using full parameter fine-tuning. ## Model Details - **Base Model**: Qwen/Qwen3-0.6B-Base - **Teacher Model**: Qwen/Qwen3-8B-Base - **Method**: Knowledge Distillation with Full Fine-tuning - **Dataset**: MMLU (Massive Multitask Language Understanding) - **Distillation Alpha**: 0.7 - **Temperature**: 4.0 - **Total Parameters**: ~600M (all parameters updated) - **Format**: Safetensors (safer and more efficient than PyTorch format) ## Training Details - **Training Samples**: 285 - **Epochs**: 30 - **Batch Size**: 2 - **Learning Rate**: 5e-05 - **Final Eval Loss**: N/A ## Usage ```python from transformers import AutoTokenizer, AutoModelForCausalLM # Load the distilled model directly tokenizer = AutoTokenizer.from_pretrained("CarlOwOs/distilled-qwen3-0.6b-full-mmlu") model = AutoModelForCausalLM.from_pretrained("CarlOwOs/distilled-qwen3-0.6b-full-mmlu") # Generate text inputs = tokenizer("Question: What is the capital of France?\nA. London\nB. Berlin\nC. Paris\nD. Madrid\nAnswer:", return_tensors="pt") outputs = model.generate(**inputs, max_length=100) print(tokenizer.decode(outputs[0], skip_special_tokens=True)) ``` ## Technical Notes - Model weights are stored in **safetensors format** for improved security and loading speed - Compatible with all Hugging Face transformers library versions that support safetensors - Memory efficient loading and faster inference compared to traditional PyTorch format ## Evaluation This model should be evaluated on MCQA tasks using log-likelihood comparison, as implemented in the evaluation framework.