BAAI
/

RoboBrain 2.0: See Better. Think Harder. Do Smarter.

  β­οΈ Project   |   β­οΈ Github   |   πŸ€– ModelScope   |   πŸ“‘ Technical Report (Coming soon)   |   πŸ’¬ WeChat

  πŸŽ― RoboOS: An Efficient Open-Source Multi-Robot Coordination System for RoboBrain.

  πŸŒ RoboBrain 1.0: A Unified Brain Model for Robotic Manipulation from Abstract to Concrete.

πŸ”₯ Overview

We are excited to introduce RoboBrain 2.0, the most powerful open-source embodied brain model to date. Compared to its predecessor, RoboBrain1.0, our latest version significantly advances multi-agent task planning, spatial reasoning, and closed-loop execution. A detailed technical report will be released soon.

πŸ—žοΈ News

  • 2025-06-07: πŸŽ‰ We highlight the training framework (FlagScale) developed by BAAI Framework R&D team, and the evaluation framework (FlagEvalMM) by BAAI FlagEval team. Both are used for RoboBrain 2.0.
  • 2025-06-06: πŸ€— RoboBrain 2.0-7B model checkpoint has been released in Huggingface..
  • 2025-06-06: πŸ”₯ We're excited to announce the release of our more powerful RoboBrain 2.0.
  • 2025-04-11: πŸŽ‰ RoboBrain 1.0 was selected for CVPR 2025's official Embodied AI Trends Commentary.
  • 2025-02-27: 🌍 RoboBrain 1.0 was accepted to CVPR2025.

πŸ“† Todo

  • Release model checkpoint for RoboBrain 2.0-7B
  • Release quick inference example for RoboBrain 2.0
  • Release training codes for RoboBrain 2.0
  • Release model checkpoint for RoboBrain 2.0-32B

πŸš€ Features

RoboBrain 2.0 supports interactive reasoning with long-horizon planning and closed-loop feedback, spatial perception for precise point and bbox prediction from complex instructions, temporal perception for future trajectory estimation, and scene reasoning through real-time structured memory construction and update.

⭐️ Architecture

RoboBrain 2.0 supports multi-image, long video, and high-resolution visual inputs, along with complex task instructions and structured scene graphs on the language side. Visual inputs are processed via a Vision Encoder and MLP Projector, while textual inputs are tokenized into a unified token stream. All inputs are fed into a LLM Decoder that performs long-chain-of-thought reasoning and outputs structured plans, spatial relations, and both relative and absolute coordinates.

πŸ€— Model Zoo

Models Checkpoint Description
RoboBrain 2.0 7B πŸ€— BAAI/RoboBrain2.0-7B 7B parameter version of the RoboBrain2.0
RoboBrain 2.0 32B πŸ€— BAAI/RoboBrain2.0-32B 32B parameter version of the RoboBrain2.0 (Coming soon)

πŸ› οΈ Setup

# clone repo.
git clone https://github.com/FlagOpen/RoboBrain2.0.git
cd RoboBrain

# build conda env.
conda create -n robobrain2 python=3.10
conda activate robobrain2
pip install -r requirements.txt

πŸ€– Simple Inference

Note: Please refer to RoboBrain 2.0 Github for the usage of RoboBrain 2.0

😊 More Results

Benchmark comparison across spatial reasoning and task planning. RoboBrain2.0-32B achieves state-of-the-art performance on four key embodied intelligence benchmarks: BLINK-Spatial, CV-Bench, EmbSpatial, and RefSpatial. It not only outperforms leading open-source models such as o4-mini and Qwen2.5-VL, but also surpasses closed-source models like Gemini 2.5 Pro and Claude Sonnet 4 β€” especially in the challenging RefSpatial benchmark, where RoboBrain2.0 shows a >50% absolute improvement.

πŸ“‘ Citation

If you find this project useful, welcome to cite us.

@article{RoboBrain 2.0 Technical Report,
    title={RoboBrain 2.0 Technical Report},
    author={BAAI RoboBrain Team},
    journal={arXiv preprint arXiv:TODO},
    year={2025}
}

@article{RoboBrain 1.0,
    title={Robobrain: A unified brain model for robotic manipulation from abstract to concrete},
    author={Ji, Yuheng and Tan, Huajie and Shi, Jiayu and Hao, Xiaoshuai and Zhang, Yuan and Zhang, Hengyuan and Wang, Pengwei and Zhao, Mengdi and Mu, Yao and An, Pengju and others},
    journal={arXiv preprint arXiv:2502.21257},
    year={2025}
}

@article{RoboOS,
    title={RoboOS: A Hierarchical Embodied Framework for Cross-Embodiment and Multi-Agent Collaboration},
    author={Tan, Huajie and Hao, Xiaoshuai and Lin, Minglan and Wang, Pengwei and Lyu, Yaoxu and Cao, Mingyu and Wang, Zhongyuan and Zhang, Shanghang},
    journal={arXiv preprint arXiv:2505.03673},
    year={2025}
}

@article{zhou2025roborefer,
    title={RoboRefer: Towards Spatial Referring with Reasoning in Vision-Language Models for Robotics},
    author={Zhou, Enshen and An, Jingkun and Chi, Cheng and Han, Yi and Rong, Shanyu and Zhang, Chi and Wang, Pengwei and Wang, Zhongyuan and Huang, Tiejun and Sheng, Lu and others},
    journal={arXiv preprint arXiv:2506.04308},
    year={2025}
}

@article{Reason-RFT,
    title={Reason-rft: Reinforcement fine-tuning for visual reasoning},
    author={Tan, Huajie and Ji, Yuheng and Hao, Xiaoshuai and Lin, Minglan and Wang, Pengwei and Wang, Zhongyuan and Zhang, Shanghang},
    journal={arXiv preprint arXiv:2503.20752},
    year={2025}
}

@article{Code-as-Monitor,
    title={Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection},
    author={Zhou, Enshen and Su, Qi and Chi, Cheng and Zhang, Zhizheng and Wang, Zhongyuan and Huang, Tiejun and Sheng, Lu and Wang, He},
    journal={arXiv preprint arXiv:2412.04455},
    year={2024}
}
Downloads last month
514
Safetensors
Model size
8.29B params
Tensor type
BF16
Β·
Video Preview
loading

Model tree for BAAI/RoboBrain2.0-7B

Quantizations
3 models

Collection including BAAI/RoboBrain2.0-7B