
"Built on openai/gpt-oss-20b"
ANITA-NEXT-20B-gpt-oss-ITA is a Thinking Model of the ANITA - Large Language Models family. The model is a fine-tuned version of openai/gpt-oss-20b (a fine-tuned OpenAI OSS model). This model version aims to be the an Agentic-Ready Multilingual Model 🏁 (EN 🇺🇸 + ITA🇮🇹) to further fine-tuning on Specific Tasks in Italian.
❗❗❗Use at your own risk. The model may generate hallucinations, incorrect, invented, offensive, unethical or dangerous responses. We are not responsible for any dangerous/offensive/criminal use. The model is release for research only purposes.❗❗❗
The 🌟ANITA project🌟 *(Advanced Natural-based interaction for the ITAlian language)* wants to provide Italian NLP researchers with an improved model for the Italian Language 🇮🇹 use cases.
The NEXT family includes four models:
- m-polignano/ANITA-NEXT-24B-Magistral-2506-ITA - General Purpose
- m-polignano/ANITA-NEXT-24B-Dolphin-Mistral-UNCENSORED-ITA - Uncensored
- m-polignano/ANITA-NEXT-24B-Magistral-2506-VISION-ITA - Vision-Language
- m-polignano/ANITA-NEXT-20B-gpt-oss-ITA - Agentic Ready
GGUF - OLLAMA: m-polignano/ANITA-NEXT-20B-gpt-oss-ITA-GGUF
Colab Demo: A100 - 40GB - Colab Notebook
The Model runs on a single GPU, 39.72GB of VRAM by using a 4bit Quantization.
Specifications
- Model developers:
Ph.D. Marco Polignano - University of Bari Aldo Moro, Italy
SWAP Research Group - Variations: The model release has been supervised fine-tuning (SFT) using QLoRA 4bit, on instruction-based datasets. DPO approach over the mlabonne/orpo-dpo-mix-40k dataset is used to align with human preferences for helpfulness and safety.
- Input: Models input text only.
- Language: Multilingual 🏁 + Italian 🇮🇹
- Output: Models generate text and code only - Agentic Ready.
- Model Architecture: OpenAI OSS architecture.
- Context length: 128k, but degradate after 40k.
- Library Used: [Transformers 4.56.0.dev0] (https://huggingface.co/docs/transformers/index)
Playground
To use the model directly, there are many ways to get started, choose one of the following ways to experience it.
System Prompt Template
L'utente ti chiederà di risolvere un compito o rispondere ad una domanda. Rispondi e ragiona usando la lingua della domanda, preferendo l'Italiano.
Scrivi il tuo flusso di pensiero (monologo interiore) nel canale di 'analysis'. Ragiona in modo disinvolto, scrivendo riflessioni e/o bozze, come se stessi lavorando a un esercizio su un foglio di carta.
Successivamente, scrivi la soluzione in modo chiaro, corretto, semplice ed esaustivo basandoti sul riassunto del tuo flusso di pensiero.
Se necessario, usa la notazione markdown per formattare la risposta.
Transformers
For direct use with transformers
, you can easily get started with the following steps.
Firstly, you need to install transformers via the command below with
pip
.pip install -U --no-deps bitsandbytes accelerate xformers transformers peft trl cut_cross_entropy unsloth_zoo pip install sentencepiece protobuf "datasets>=3.4.1,<4.0.0" "huggingface_hub>=0.34.0" hf_transfer
Right now, you can start using the model directly.
from transformers import AutoModelForCausalLM, AutoTokenizer import torch from transformers import BitsAndBytesConfig nf4_config = BitsAndBytesConfig( load_in_4bit=True, bnb_4bit_quant_type="nf4", bnb_4bit_use_double_quant=True, bnb_4bit_compute_dtype=torch.bfloat16 ) model_dir = "m-polignano/ANITA-NEXT-20B-openai-gpt-oss-ITA" tokenizer = AutoTokenizer.from_pretrained(model_dir, use_fast=True) model = AutoModelForCausalLM.from_pretrained( model_dir, quantization_config=nf4_config, device_map="auto", torch_dtype=torch.bfloat16, ) #Method 1 sys = '''L'utente ti chiederà di risolvere un compito o rispondere ad una domanda. Rispondi e ragiona usando la lingua della domanda, preferendo l'Italiano. Scrivi il tuo flusso di pensiero (monologo interiore) nel canale di 'analysis'. Ragiona in modo disinvolto, scrivendo riflessioni e/o bozze, come se stessi lavorando a un esercizio su un foglio di carta. Successivamente, scrivi la soluzione in modo chiaro, corretto, semplice ed esaustivo basandoti sul riassunto del tuo flusso di pensiero. Se necessario, usa la notazione markdown per formattare la risposta.''' messages = [ {"role" : "system", "content" : sys}, {"role" : "user", "content" : "Chi è Carlo Magno?"} ] prompt = tokenizer.apply_chat_template(messages, tokenize=False, add_generation_prompt=True) inputs = tokenizer(prompt, return_tensors="pt", add_special_tokens=False) for k,v in inputs.items(): inputs[k] = v.cuda() outputs = model.generate(**inputs, max_new_tokens=32786, do_sample=True, top_p=0.9, temperature=0.7) results = tokenizer.batch_decode(outputs)[0] print(results)
Citation instructions
@misc{polignano2024advanced,
title={Advanced Natural-based interaction for the ITAlian language: LLaMAntino-3-ANITA},
author={Marco Polignano and Pierpaolo Basile and Giovanni Semeraro},
year={2024},
eprint={2405.07101},
archivePrefix={arXiv},
primaryClass={cs.CL}
}
@misc{openai2025gptoss,
author = {{OpenAI}},
title = {Introducing gpt‑oss},
howpublished = {\url{https://openai.com/en-EN/index/introducing-gpt-oss/}},
year = {2025},
month = aug,
day = {5},
note = {Accessed: 16 August 2025},
}
- Downloads last month
- 10,217