Training robotic policies directly in the real world is expensive and unscalable. Although generative simulation enables large-scale data synthesis, current approaches often fail to generate logically coherent long-horizon tasks and struggle with dynamic physical uncertainties due to open-loop execution. To address these challenges, we propose Affordance-Graphed Task Worlds (AGT-World), a unified framework that autonomously constructs interactive simulated environments and corresponding robot task policies based on real-world observations. Unlike methods relying on random proposals or static replication, AGT-World formalizes the task space as a structured graph, enabling the precise, hierarchical decomposition of complex goals into theoretically grounded atomic primitives. Furthermore, we introduce a Self-Evolution mechanism with hybrid feedback to autonomously refine policies, combining Vision-Language Model reasoning and geometric verification. Extensive experiments demonstrate that our method significantly outperforms in success rates and generalization, achieving a self-improving cycle of proposal, execution, and correction for scalable robot learning.
翻译:直接在现实世界中训练机器人策略成本高昂且难以扩展。尽管生成式仿真能够实现大规模数据合成,但现有方法通常难以生成逻辑连贯的长时程任务,并且由于开环执行而难以应对动态物理不确定性。为应对这些挑战,我们提出了可操作图任务世界(AGT-World),这是一个基于真实世界观测、自主构建交互式仿真环境及相应机器人任务策略的统一框架。与依赖随机提议或静态复制的方法不同,AGT-World将任务空间形式化为结构化图,从而能够将复杂目标精确、分层地分解为理论依据充分的原子基元。此外,我们引入了一种具有混合反馈的自演化机制,结合视觉语言模型推理与几何验证,以自主优化策略。大量实验表明,我们的方法在成功率和泛化能力上显著优于现有方法,实现了提议、执行与修正的自改进循环,从而支持可扩展的机器人学习。