Latent world models are a promising approach for learning state representations and dynamics directly from high-dimensional observations, enabling robot control in hard-to-model settings. However, control performance ultimately depends on the latent representation encoding the required information for the task. In this work, we study latent-space safe control problems and show how partial observability can induce control failures when safety-relevant information is not preserved in the latent state. Specifically, we identify two world model failure modes: estimation gaps, where current observations do not reveal safety-critical quantities (e.g., temperature in a cooking task), and prediction gaps, where failures are observable once they occur but cannot be reliably anticipated from available observations. We introduce two diagnostics for these gaps: a mutual-information-based measure of safety observability and a rollout-based measure of future safety predictability. Finally, we present mitigation strategies for each failure mode: privileged multimodal supervision for estimation gaps and conformal risk calibration for prediction gaps. Across two hardware case studies -- using unimodal RGB world models and multimodal RGB+Tactile and RGB+Thermal variants -- we show that these mitigation strategies improve the safety of a Franka Research 3 manipulator on challenging cooking tasks under partial observability, albeit with increased conservativeness. More broadly, our work raises the question of when world model state representations are sufficient for reliable robot control
翻译:隐式世界模型是一种直接从高维观测中学习状态表征和动态特性的有前景方法,能够使机器人在难以建模的环境中实现控制。然而,控制性能最终取决于隐式表征是否编码了任务所需的关键信息。本文研究了隐空间中的安全控制问题,并揭示了当安全相关信息未在隐状态中保留时,部分可观测性如何导致控制失败。具体而言,我们识别出两种世界模型失效模式:估计缺口(当前观测无法揭示安全关键量,例如烹饪任务中的温度)和预测缺口(失效一旦发生即可被观测到,但无法从现有观测中可靠地预测)。针对这两种缺口,我们引入了两种诊断指标:基于互信息的安全可观测性度量,以及基于轨迹展开的未来安全可预测性度量。最后,我们提出了针对每种失效模式的缓解策略:针对估计缺口采用特权多模态监督,针对预测缺口采用共形风险校准。通过两项硬件案例研究(使用单模态RGB世界模型以及多模态RGB+触觉和RGB+热成像变体),我们证明这些缓解策略能够提高Franka Research 3机械臂在部分可观测条件下执行复杂烹饪任务时的安全性,尽管会带来保守性增加。更广泛地看,我们的工作提出了一个核心问题:世界模型状态表征何时足以实现可靠的机器人控制。