|
--- |
|
license: apache-2.0 |
|
language: |
|
- en |
|
- fr |
|
- de |
|
- es |
|
- it |
|
- pt |
|
base_model: |
|
- alamios/Qwenstral-Small-3.1-0.5B |
|
datasets: |
|
- alamios/Mistral-Small-24B-Instruct-2501-Conversations |
|
pipeline_tag: text-generation |
|
library_name: transformers |
|
tags: |
|
- qwen |
|
- qwen2.5 |
|
- mistral |
|
- mistral-small |
|
- mistral-small-3.1 |
|
--- |
|
|
|
# Mistral-Small-3.1-DRAFT-0.5B |
|
|
|
This model is meant to be used as draft model for speculative decoding with [mistralai/Mistral-Small-3.1-24B-Instruct-2503](https://huggingface.co/mistralai/Mistral-Small-3.1-24B-Instruct-2503) or [mistralai/Mistral-Small-24B-Instruct-2501](https://huggingface.co/mistralai/Mistral-Small-24B-Instruct-2501) |
|
|
|
# Data info |
|
|
|
The data are Mistral's outputs and includes all kind of tasks from various datasets in English, French, German, Spanish, Italian and Portuguese. It has been trained for 2 epochs on 20k unique examples, for a total of 12 million tokens per epoch. |