Pretrained-feature world models provide a useful substrate for robot imagination, but visual or latent prediction alone does not determine whether an imagined future satisfies task-relevant events. Long-horizon manipulation requires progress signals that are relational, predicate-level, and physically grounded: whether an object has moved, whether a drawer or contact state has changed, whether a placement predicate is satisfied, and whether a candidate future is reliable enough for execution. We introduce EA-WM, an event-aware world-model framework that augments frozen visual-feature dynamics with task-specification-grounded event prediction and verification. EA-WM rolls out candidate futures in pretrained visual-feature space, decodes them into structured event states, and scores them using task-progress, semantic-consistency, physical-feasibility, and uncertainty terms. The verifier guides sampling-based planning, gates candidate actions, and, in the contact-sensitive LIBERO wine-rack setting, selects among PPOgenerated proposals. Across navigation, deformable-object, wall-constrained, and languagedescribed manipulation studies, EA-WM shows that event-aware verification can make featurespace world models more interpretable and better aligned with task progress.
翻译:预训练特征世界模型为机器人想象提供了有用的底层支撑,但仅凭视觉或潜在状态预测无法确定所想象的未来是否满足任务相关事件。长时程操控需要关系性、谓词级且物理锚定的进度信号:物体是否发生位移、抽屉或接触状态是否改变、放置谓词是否满足、候选未来状态是否足够可靠以执行。我们提出EA-WM,一种事件感知世界模型框架,通过任务规范锚定的事件预测与验证来增强冻结的视觉特征动力学。EA-WM在预训练视觉特征空间中展开候选未来状态,将其解码为结构化事件状态,并利用任务进度、语义一致性、物理可行性及不确定性项进行评分。该验证器引导基于采样的规划、门控候选动作,并可在接触敏感的LIBERO酒架场景中筛选PPO生成的提议。在导航、可变形物体、墙壁约束及语言描述操控研究中,EA-WM表明事件感知验证能使特征空间世界模型更具可解释性,并与任务进度更好地对齐。