Papers
arxiv:2508.19229

StepWiser: Stepwise Generative Judges for Wiser Reasoning

Published on Aug 26
· Submitted by wentingzhao on Aug 28
Authors:
,
,
,
,
,
,

Abstract

A generative judge model, StepWiser, uses reinforcement learning to provide step-by-step reasoning feedback, improving both training and inference performance of policy models.

AI-generated summary

As models increasingly leverage multi-step reasoning strategies to solve complex problems, supervising the logical validity of these intermediate steps has become a critical research challenge. Process reward models address this by providing step-by-step feedback, but current approaches have two major drawbacks: they typically function as classifiers without providing explanations, and their reliance on supervised fine-tuning with static datasets limits generalization. Inspired by recent advances, we reframe stepwise reward modeling from a classification task to a reasoning task itself. We thus propose a generative judge that reasons about the policy model's reasoning steps (i.e., meta-reasons), outputting thinking tokens before delivering a final verdict. Our model, StepWiser, is trained by reinforcement learning using relative outcomes of rollouts. We show it provides (i) better judgment accuracy on intermediate steps than existing methods; (ii) can be used to improve the policy model at training time; and (iii) improves inference-time search.

Community

🪜Introducing: StepWiser🦉
📝: http://arxiv.org/abs/2508.19229

  • Reframes stepwise reward modeling as a reasoning task: outputs CoT + judgment.
  • Trained by RL using relative outcomes of rollouts.
    Results:
    (1) SOTA performance on ProcessBench!
    (2) Improves policy at train time.
    (3) Improves inference-time search.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2508.19229 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2508.19229 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2508.19229 in a Space README.md to link it from this page.

Collections including this paper 2