RoboBrain 2.0: See Better. Think Harder. Do Smarter.

⭐️ Project | ⭐️ Github | 🤖 ModelScope | 📑 Technical Report (Coming soon) | 💬 WeChat

🎯 RoboOS: An Efficient Open-Source Multi-Robot Coordination System for RoboBrain.

🌍 RoboBrain 1.0: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete.

🔥 Overview

We are excited to introduce RoboBrain 2.0, the most powerful open-source embodied brain model to date. Compared to its predecessor, RoboBrain1.0, our latest version significantly advances multi-agent task planning, spatial reasoning, and closed-loop execution. A detailed technical report will be released soon.

🗞️ News

2025-06-07: 🎉 We highlight the training framework (FlagScale) developed by BAAI Framework R&D team, and the evaluation framework (FlagEvalMM) by BAAI FlagEval team. Both are used for RoboBrain 2.0.
2025-06-06: 🤗 RoboBrain 2.0-7B model checkpoint has been released in Huggingface..
2025-06-06: 🔥 We're excited to announce the release of our more powerful RoboBrain 2.0.
2025-04-11: 🎉 RoboBrain 1.0 was selected for CVPR 2025's official Embodied AI Trends Commentary.
2025-02-27: 🌍 RoboBrain 1.0 was accepted to CVPR2025.

📆 Todo

Release model checkpoint for RoboBrain 2.0-7B
Release quick inference example for RoboBrain 2.0
Release training codes for RoboBrain 2.0
Release model checkpoint for RoboBrain 2.0-32B

🚀 Features

RoboBrain 2.0 supports interactive reasoning with long-horizon planning and closed-loop feedback, spatial perception for precise point and bbox prediction from complex instructions, temporal perception for future trajectory estimation, and scene reasoning through real-time structured memory construction and update.

⭐️ Architecture

RoboBrain 2.0 supports multi-image, long video, and high-resolution visual inputs, along with complex task instructions and structured scene graphs on the language side. Visual inputs are processed via a Vision Encoder and MLP Projector, while textual inputs are tokenized into a unified token stream. All inputs are fed into a LLM Decoder that performs long-chain-of-thought reasoning and outputs structured plans, spatial relations, and both relative and absolute coordinates.

🤗 Model Zoo

Models	Checkpoint	Description
RoboBrain 2.0 7B	🤗 BAAI/RoboBrain2.0-7B	7B parameter version of the RoboBrain2.0
RoboBrain 2.0 32B	🤗 BAAI/RoboBrain2.0-32B	32B parameter version of the RoboBrain2.0 (Coming soon)

🛠️ Setup

# clone repo.
git clone https://github.com/FlagOpen/RoboBrain2.0.git
cd RoboBrain

# build conda env.
conda create -n robobrain2 python=3.10
conda activate robobrain2
pip install -r requirements.txt

🤖 Simple Inference

Note: Please refer to RoboBrain 2.0 Github for the usage of RoboBrain 2.0

😊 More Results

Benchmark comparison across spatial reasoning and task planning. RoboBrain2.0-32B achieves state-of-the-art performance on four key embodied intelligence benchmarks: BLINK-Spatial, CV-Bench, EmbSpatial, and RefSpatial. It not only outperforms leading open-source models such as o4-mini and Qwen2.5-VL, but also surpasses closed-source models like Gemini 2.5 Pro and Claude Sonnet 4 — especially in the challenging RefSpatial benchmark, where RoboBrain2.0 shows a >50% absolute improvement.

📑 Citation

If you find this project useful, welcome to cite us.

@article{RoboBrain 2.0 Technical Report,
    title={RoboBrain 2.0 Technical Report},
    author={BAAI RoboBrain Team},
    journal={arXiv preprint arXiv:TODO},
    year={2025}
}

@article{RoboBrain 1.0,
    title={Robobrain: A unified brain model for robotic manipulation from abstract to concrete},
    author={Ji, Yuheng and Tan, Huajie and Shi, Jiayu and Hao, Xiaoshuai and Zhang, Yuan and Zhang, Hengyuan and Wang, Pengwei and Zhao, Mengdi and Mu, Yao and An, Pengju and others},
    journal={arXiv preprint arXiv:2502.21257},
    year={2025}
}

@article{RoboOS,
    title={RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration},
    author={Tan, Huajie and Hao, Xiaoshuai and Lin, Minglan and Wang, Pengwei and Lyu, Yaoxu and Cao, Mingyu and Wang, Zhongyuan and Zhang, Shanghang},
    journal={arXiv preprint arXiv:2505.03673},
    year={2025}
}

@article{zhou2025roborefer,
    title={RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics},
    author={Zhou, Enshen and An, Jingkun and Chi, Cheng and Han, Yi and Rong, Shanyu and Zhang, Chi and Wang, Pengwei and Wang, Zhongyuan and Huang, Tiejun and Sheng, Lu and others},
    journal={arXiv preprint arXiv:2506.04308},
    year={2025}
}

@article{Reason-RFT,
    title={Reason-rft: Reinforcement fine-tuning for visual reasoning},
    author={Tan, Huajie and Ji, Yuheng and Hao, Xiaoshuai and Lin, Minglan and Wang, Pengwei and Wang, Zhongyuan and Zhang, Shanghang},
    journal={arXiv preprint arXiv:2503.20752},
    year={2025}
}

@article{Code-as-Monitor,
    title={Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection},
    author={Zhou, Enshen and Su, Qi and Chi, Cheng and Zhang, Zhizheng and Wang, Zhongyuan and Huang, Tiejun and Sheng, Lu and Wang, He},
    journal={arXiv preprint arXiv:2412.04455},
    year={2024}
}

BAAI
/

RoboBrain2.0-7B