nvidia
/

Llama-3.1-Nemotron-Nano-8B-v1

@@ -90,6 +90,7 @@ Llama-3.1-Nemotron-Nano-8B-v1 is a general purpose reasoning and chat model inte
 2. We recommend setting temperature to `0.6`, and Top P to `0.95` for Reasoning ON mode
 3. We recommend using greedy decoding for Reasoning OFF mode
 4. We have provided a list of prompts to use for evaluation for each benchmark where a specific template is required
 You can try this model out through the preview API, using this link: [Llama-3.1-Nemotron-Nano-8B-v1](https://build.nvidia.com/nvidia/llama-3_1-nemotron-nano-8b-v1).

 2. We recommend setting temperature to `0.6`, and Top P to `0.95` for Reasoning ON mode
 3. We recommend using greedy decoding for Reasoning OFF mode
 4. We have provided a list of prompts to use for evaluation for each benchmark where a specific template is required
+5. The model will include `<think></think>` if no reasoning was necessary in Reasoning ON model, this is expected behaviour
 You can try this model out through the preview API, using this link: [Llama-3.1-Nemotron-Nano-8B-v1](https://build.nvidia.com/nvidia/llama-3_1-nemotron-nano-8b-v1).