README.md · erayalp/qwen2.5-0.5b-instruct-GRPO-v3-tr-math-gsm8k at main

metadata

license: apache-2.0
license_link: https://huggingface.co/Qwen/Qwen2.5-0.5B-Instruct/blob/main/LICENSE
language:
  - tr
  - en
datasets:
  - ytu-ce-cosmos/gsm8k_tr
base_model:
  - erayalp/qwen2.5-0.5b-instruct-SFT-v2-tr-math-medium
pipeline_tag: text-generation
library_name: transformers
tags:
  - group-relative-policy-optimization
  - reinforcement-learning
  - curriculum-learning
  - math
  - supervised-fine-tuning
  - reasoning
  - turkish

Objective

This model is the final product of the multi-stage training pipeline designed to improve the Turkish mathematical reasoning capabilities of the compact Qwen2.5-0.5B model.

Starting from erayalp/qwen2.5-0.5b-instruct-sft-v2-tr-math-medium, which was fine-tuned on 2-3 step reasoning Turkish math problems, this version continues training using ytu-ce-cosmos/gsm8k_tr to improve the model’s step-by-step reasoning and generalization to multi step grade school math performance on such small model.

This model is intended for:

Researchers exploring reinforcement learning on small LLMs.
Research on curriculum learning and multi-step math reasoning in small models.
Comparative baselines for evaluating Turkish math reasoning tasks of grade school math complexity.

Limitations

With only 0.5B parameters, it may not perform as robustly as larger models.
Math-specific hallucinations may persist in underrepresented edge patterns.
Prompt sensitivity and reasoning depth are open to future improvements.

Roadmap

~~Phase 1: SFT with basic arithmatic and math problems~~
~~Phase 2: SFT with moderately difficult math problems~~
Phase 3: SFT with full-scale GSM8K-TR complexity
Phase 4: GRPO-based training to optimize multi-step reasoning and reduce hallucinations

How to Use

You can easily run inference using the Transformers library:

from transformers import AutoTokenizer, AutoModelForCausalLM
import torch

model_name = "erayalp/qwen2.5-0.5B-instruct-GRPO-v3-tr-math-gsm8k"
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(
    model_name,
    torch_dtype="auto",
    device_map="auto"
)

prompt = "Bir bahçede 25 gül var. 40 lale vardır. 35 papatya var. Çiçeklerin yüzde kaçı gül değildir?"
inputs = tokenizer(prompt, return_tensors="pt")
outputs = model.generate(**inputs, max_new_tokens=256)
print(tokenizer.decode(outputs[0], skip_special_tokens=True))