README.md · CreitinGameplays/Mistral-Nemo-12B-R1-v0.2 at refs/pr/10

metadata

license: mit
datasets:
  - >-
    CreitinGameplays/Magpie-Reasoning-V2-250K-CoT-Deepseek-R1-Llama-70B-filtered-mistral
language:
  - en
base_model:
  - mistralai/Mistral-Nemo-Instruct-2407
pipeline_tag: text-generation
library_name: transformers

Mistral Nemo 12B R1

Took 12 hours to finetune on 1x Nvidia H100 with the following settings:

Batch size: 26
Gradient accumulation steps: 1
Epochs: 1
Learning rate: 2e-5
Warmup ratio: 0.1

Run the model:

import torch
from transformers import pipeline

model_id = "CreitinGameplays/Mistral-Nemo-12B-R1-v0.2"

pipe = pipeline(
    "text-generation",
    model=model_id,
    torch_dtype=torch.bfloat16,
    device_map="auto"
)

messages = [
    {"role": "system", "content": "You are a helpful AI assistant named Mistral Nemo."},
    {"role": "user", "content": "How many r's are in strawberry?"}
]

outputs = pipe(
    messages,
    temperature=0.6,
    top_p=0.95,
    top_k=40,
    repetition_penalty=1.1,
    max_new_tokens=2048
)

print(outputs[0]["generated_text"][-1])

Example response generated with this model:

User prompt:

how many r's are in strawberry?

Model output:

<think>
To determine the number of 'r's in the word "strawberry," I will start by breaking down the word into its component parts: "straw" and "berry."

Next, I will count the number of 'r's in each part. The word "straw" contains one 'r,' and the word "berry" contains two 'r's.

Finally, I will add the number of 'r's from both parts together, which gives me a total of three 'r's in the word "strawberry."
</think>

To determine the number of 'r's in the word "strawberry," let's break it down step by step:

1. **Break down the word:**
   
   The word "strawberry" can be divided into two parts: "straw" and "berry."

2. **Count the 'r's in each part:**
   
   - In "straw": There is **1** 'r'.
   - In "berry": There are **2** 'r's.

3. **Add the 'r's together:**
   
   \( 1 \text{ (from straw)} + 2 \text{ (from berry)} = 3 \)

Therefore, the word "strawberry" contains **3** 'r's.

\[
\boxed{3}
\]</s>

Recommended system prompt for activating reasoning step:

You are an AI focused on providing systematic, well-reasoned responses. Response Structure: - Format: <think>{reasoning}</think>{answer} - Reasoning: Minimum 6 logical steps only when it required in <think> block - Process: Think first, then answer.

Note: The model was mainly finetuned on English dataset, meaning the model may not perform well in other languages.