
RoboBrain 2.0: See Better. Think Harder. Do Smarter.
βοΈ Project | βοΈ Github | π€ ModelScope | π Technical Report (Coming soon) | π¬ WeChat
π― RoboOS: An Efficient Open-Source Multi-Robot Coordination System for RoboBrain.
π RoboBrain 1.0: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete.
π₯ Overview
We are excited to introduce RoboBrain 2.0, the most powerful open-source embodied brain model to date. Compared to its predecessor, RoboBrain1.0, our latest version significantly advances multi-agent task planning, spatial reasoning, and closed-loop execution. A detailed technical report will be released soon.

ποΈ News
2025-06-07
: π We highlight the training framework (FlagScale) developed by BAAI Framework R&D team, and the evaluation framework (FlagEvalMM) by BAAI FlagEval team. Both are used for RoboBrain 2.0.2025-06-06
: π€ RoboBrain 2.0-7B model checkpoint has been released in Huggingface..2025-06-06
: π₯ We're excited to announce the release of our more powerful RoboBrain 2.0.2025-04-11
: π RoboBrain 1.0 was selected for CVPR 2025's official Embodied AI Trends Commentary.2025-02-27
: π RoboBrain 1.0 was accepted to CVPR2025.
π Todo
- Release model checkpoint for RoboBrain 2.0-7B
- Release quick inference example for RoboBrain 2.0
- Release training codes for RoboBrain 2.0
- Release model checkpoint for RoboBrain 2.0-32B
π Features
RoboBrain 2.0 supports interactive reasoning with long-horizon planning and closed-loop feedback, spatial perception for precise point and bbox prediction from complex instructions, temporal perception for future trajectory estimation, and scene reasoning through real-time structured memory construction and update.

βοΈ Architecture
RoboBrain 2.0 supports multi-image, long video, and high-resolution visual inputs, along with complex task instructions and structured scene graphs on the language side. Visual inputs are processed via a Vision Encoder and MLP Projector, while textual inputs are tokenized into a unified token stream. All inputs are fed into a LLM Decoder that performs long-chain-of-thought reasoning and outputs structured plans, spatial relations, and both relative and absolute coordinates.

π€ Model Zoo
Models | Checkpoint | Description |
---|---|---|
RoboBrain 2.0 7B | π€ BAAI/RoboBrain2.0-7B | 7B parameter version of the RoboBrain2.0 |
RoboBrain 2.0 32B | π€ BAAI/RoboBrain2.0-32B | 32B parameter version of the RoboBrain2.0 (Coming soon) |
π οΈ Setup
# clone repo.
git clone https://github.com/FlagOpen/RoboBrain2.0.git
cd RoboBrain
# build conda env.
conda create -n robobrain2 python=3.10
conda activate robobrain2
pip install -r requirements.txt
π€ Simple Inference
Note: Please refer to RoboBrain 2.0 Github for the usage of RoboBrain 2.0
π More Results
Benchmark comparison across spatial reasoning and task planning. RoboBrain2.0-32B achieves state-of-the-art performance on four key embodied intelligence benchmarks: BLINK-Spatial, CV-Bench, EmbSpatial, and RefSpatial. It not only outperforms leading open-source models such as o4-mini and Qwen2.5-VL, but also surpasses closed-source models like Gemini 2.5 Pro and Claude Sonnet 4 β especially in the challenging RefSpatial benchmark, where RoboBrain2.0 shows a >50% absolute improvement.

π Citation
If you find this project useful, welcome to cite us.
@article{RoboBrain 2.0 Technical Report,
title={RoboBrain 2.0 Technical Report},
author={BAAI RoboBrain Team},
journal={arXiv preprint arXiv:TODO},
year={2025}
}
@article{RoboBrain 1.0,
title={Robobrain: A unified brain model for robotic manipulation from abstract to concrete},
author={Ji, Yuheng and Tan, Huajie and Shi, Jiayu and Hao, Xiaoshuai and Zhang, Yuan and Zhang, Hengyuan and Wang, Pengwei and Zhao, Mengdi and Mu, Yao and An, Pengju and others},
journal={arXiv preprint arXiv:2502.21257},
year={2025}
}
@article{RoboOS,
title={RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration},
author={Tan, Huajie and Hao, Xiaoshuai and Lin, Minglan and Wang, Pengwei and Lyu, Yaoxu and Cao, Mingyu and Wang, Zhongyuan and Zhang, Shanghang},
journal={arXiv preprint arXiv:2505.03673},
year={2025}
}
@article{zhou2025roborefer,
title={RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics},
author={Zhou, Enshen and An, Jingkun and Chi, Cheng and Han, Yi and Rong, Shanyu and Zhang, Chi and Wang, Pengwei and Wang, Zhongyuan and Huang, Tiejun and Sheng, Lu and others},
journal={arXiv preprint arXiv:2506.04308},
year={2025}
}
@article{Reason-RFT,
title={Reason-rft: Reinforcement fine-tuning for visual reasoning},
author={Tan, Huajie and Ji, Yuheng and Hao, Xiaoshuai and Lin, Minglan and Wang, Pengwei and Wang, Zhongyuan and Zhang, Shanghang},
journal={arXiv preprint arXiv:2503.20752},
year={2025}
}
@article{Code-as-Monitor,
title={Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection},
author={Zhou, Enshen and Su, Qi and Chi, Cheng and Zhang, Zhizheng and Wang, Zhongyuan and Huang, Tiejun and Sheng, Lu and Wang, He},
journal={arXiv preprint arXiv:2412.04455},
year={2024}
}
- Downloads last month
- 514