Pretrained-feature world models provide a useful substrate for robot imagination, but visual or latent prediction alone does not determine whether an imagined future satisfies task-relevant predicates. Long-horizon manipulation requires progress signals that are relational, predicate-level, and physically grounded: whether an object has moved, whether a drawer or contact state has changed, whether a placement predicate is satisfied, and whether a candidate future is reliable enough for execution. We introduce \textbf{EV-WM}, a predicate-grounded verification framework for world-model planning. EV-WM rolls out candidate futures in pretrained visual-feature space, decodes them into structured event states, and scores them using task-progress, semantic-consistency, physical-feasibility, and uncertainty terms. The verifier guides sampling-based planning, gates candidate actions, and, in the contact-sensitive LIBERO wine-rack setting, selects among PPO-generated proposals. Across navigation, deformable-object, wall-constrained, and language-described manipulation studies, EV-WM shows that predicate-grounded verification can make feature-space world-model planning more interpretable and better aligned with task progress.
翻译:预训练特征世界模型为机器人想象提供了有用的基板,但仅凭视觉或潜在空间预测无法确定所想象的未来是否满足任务相关谓词。长时域操作需要具有关系性、谓词级和物理可落地性的进展信号:物体是否发生移动、抽屉或接触状态是否改变、放置谓词是否满足、候选未来状态是否足够可靠以执行。我们提出\textbf{EV-WM}——一种基于谓词验证的世界模型规划框架。EV-WM在预训练视觉特征空间中展开候选未来轨迹,将其解码为结构化事件状态,并使用任务进展、语义一致性、物理可行性和不确定性项进行评分。该验证器引导基于采样的规划,对候选动作进行门控,并在对接触敏感的LIBERO酒架场景中,从PPO生成的提案中进行筛选。在导航、可变形物体、墙壁约束和语言描述的操作研究中,EV-WM表明谓词级验证能使特征空间世界模型规划更具可解释性,并与任务进展更好地对齐。