M1: Towards Scalable Test-Time Compute with Mamba Reasoning Models
Abstract
Effective reasoning is crucial to solving complex mathematical problems. Recent large language models (LLMs) have boosted performance by scaling test-time computation through long chain-of-thought reasoning. However, transformer-based models are inherently limited in extending context length due to their quadratic computational complexity and linear memory requirements. In this paper, we introduce a novel hybrid linear RNN reasoning model, M1, built on the Mamba architecture, which allows memory-efficient inference. Our approach leverages a distillation process from existing reasoning models and is further enhanced through RL training. Experimental results on the AIME and MATH benchmarks show that M1 not only outperforms previous linear RNN models but also matches the performance of state-of-the-art Deepseek R1 distilled reasoning models at a similar scale. We also compare our generation speed with a highly performant general purpose inference engine, vLLM, and observe more than a 3x speedup compared to a same size transformer. With throughput speedup, we are able to achieve higher accuracy compared to DeepSeek R1 distilled transformer reasoning models under a fixed generation time budget using self-consistency voting. Overall, we introduce a hybrid Mamba reasoning model and provide a more effective approach to scaling test-time generation using self-consistency or long chain of thought reasoning.
Community
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Thinking Slow, Fast: Scaling Inference Compute with Distilled Reasoners (2025)
- OpenCodeReasoning: Advancing Data Distillation for Competitive Coding (2025)
- m1: Unleash the Potential of Test-Time Scaling for Medical Reasoning with Large Language Models (2025)
- Efficient Reasoning Models: A Survey (2025)
- xLSTM 7B: A Recurrent LLM for Fast and Efficient Inference (2025)
- Stop Overthinking: A Survey on Efficient Reasoning for Large Language Models (2025)
- Think Twice: Enhancing LLM Reasoning by Scaling Multi-round Test-time Thinking (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 2
Datasets citing this paper 3
Spaces citing this paper 0
No Space linking this paper