World Action Verifier: Self-Improving World Models via Forward-Inverse Asymmetry

General-purpose world models promise scalable policy evaluation, optimization, and planning, yet achieving the required level of robustness remains challenging. Unlike policy learning which primarily focuses on optimal actions, a world model needs to be reliable over a vast space of suboptimal actions, which are often underrepresented in action-labeled robot interactions. To address this challenge, we propose World Action Verifier (WAV), a framework that enables world models to identify their own prediction errors and self-improve. The key idea is to decompose action-conditioned state prediction into two independently verifiable factors: state plausibility and action reachability. We show that verifying these factors is significantly more tractable than direct forward prediction due to two underlying asymmetries: the broader availability of action-free data and the lower dimensionality of action-relevant features. Leveraging these asymmetries, we augment a world model with (i) a diverse subgoal generator obtained from video corpora and (ii) a sparse inverse model that infers actions from a subset of state features. By enforcing cycle consistency among proposed subgoals, inferred actions, and forward rollouts, WAV provides an effective verification mechanism in under-explored regimes, where existing methods often fail. Across nine tasks spanning MiniGrid, RoboMimic, and ManiSkill, our method achieves 2x higher sample efficiency while improving downstream policy performance by over 22%.

翻译：通用世界模型有望实现可扩展的策略评估、优化与规划，然而达到所需的鲁棒性仍然面临挑战。与主要关注最优动作的策略学习不同，世界模型需要在大量次优动作构成的广阔空间中保持可靠性，而这些次优动作在带有动作标签的机器人交互数据中往往代表性不足。为解决这一问题，我们提出世界动作验证器（WAV）框架，该框架使世界模型能够识别自身预测错误并进行自我改进。其核心思想是将动作条件的状态预测分解为两个可独立验证的因子：状态合理性与动作可达性。研究表明，由于两种潜在的不对称性——无动作数据的更广泛可用性以及动作相关特征的更低维度——验证这些因子比直接进行前向预测更为可行。利用这些不对称性，我们通过两种方式增强世界模型：（i）从视频语料库中获取的多样化子目标生成器，以及（ii）从状态特征子集中推断动作的稀疏逆模型。通过强制所提议的子目标、推断动作与前向展开之间的循环一致性，WAV在现有方法常失效的探索不足区域提供了有效的验证机制。在涵盖MiniGrid、RoboMimic和ManiSkill的九个任务中，我们的方法实现了2倍的样本效率提升，同时下游策略性能改进超过22%。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

综述 | 世界动作模型：少做梦，多行动

专知会员服务

4+阅读 · 6月23日

综述 | 机器人操作世界模型：预测、行动接口与学习生命周期

专知会员服务

10+阅读 · 6月3日

【综述】世界模型：架构、方法、推理与应用全景

专知会员服务

30+阅读 · 6月2日

世界动作模型: 具身AI的下一个前沿

专知会员服务

22+阅读 · 5月13日