Papers
arxiv:2504.15785

WALL-E 2.0: World Alignment by NeuroSymbolic Learning improves World Model-based LLM Agents

Published on Apr 22
· Submitted by zhoutianyi on Apr 23
Authors:
,
,
,
,
,

Abstract

Can we build accurate world models out of large language models (LLMs)? How can world models benefit LLM agents? The gap between the prior knowledge of LLMs and the specified environment's dynamics usually bottlenecks LLMs' performance as world models. To bridge the gap, we propose a training-free "world alignment" that learns an environment's symbolic knowledge complementary to LLMs. The symbolic knowledge covers action rules, knowledge graphs, and scene graphs, which are extracted by LLMs from exploration trajectories and encoded into executable codes to regulate LLM agents' policies. We further propose an RL-free, model-based agent "WALL-E 2.0" through the model-predictive control (MPC) framework. Unlike classical MPC requiring costly optimization on the fly, we adopt an LLM agent as an efficient look-ahead optimizer of future steps' actions by interacting with the neurosymbolic world model. While the LLM agent's strong heuristics make it an efficient planner in MPC, the quality of its planned actions is also secured by the accurate predictions of the aligned world model. They together considerably improve learning efficiency in a new environment. On open-world challenges in Mars (Minecraft like) and ALFWorld (embodied indoor environments), WALL-E 2.0 significantly outperforms existing methods, e.g., surpassing baselines in Mars by 16.1%-51.6% of success rate and by at least 61.7% in score. In ALFWorld, it achieves a new record 98% success rate after only 4 iterations.

Community

Paper author Paper submitter

WALL-E 2.0 = Neuro-Symbolic World Model + MPC-based LLM Agent

Neuro-Symbolic World Model = LLM + Complementary Symbolic knowledge (action rules, scene graph, knowledge graph) extracted from LLM errors in past experiences

New Updates to WALL-E 1.0:

  • Inductive Learning of Knowledge Graph: WALL-E 2.0 constructs a knowledge graph by performing inductive reasoning of LLM to infer symbolic relations (e.g., require, consume) from the past experience, enriching the agent’s understanding of action preconditions and effects.
  • Dynamic Scene Graph Extraction: WALL-E 2.0 dynamically builds a scene graph from real-time environment feedback, providing a structured and up-to-date representation of objects and their spatial relationships in the environment.
  • Neuro-Symbolic World Model Integration: WALL-E 2.0 incorporates executable action rules, knowledge graph, and scene graph with an LLM, resulting in a unified neurosymbolic world model. This allows the LLM agent to perform scene-aware, structured, and interpretable planning, which significantly improves the agent's adaptation to complex, dynamic environments.
Paper author Paper submitter

New SOTA on ALFWorld and Mars (Minecraft-typed) tasks:

Screenshot 2025-04-22 at 11.02.03 PM.png

coooool

Your need to confirm your account before you can post a new comment.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2504.15785 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2504.15785 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2504.15785 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.