A world model creates a surrogate world to train a controller and predict safety violations by learning the internal dynamic model of systems. However, the existing world models rely solely on statistical learning of how observations change in response to actions, lacking precise quantification of how accurate the surrogate dynamics are, which poses a significant challenge in safety-critical systems. To address this challenge, we propose foundation world models that embed observations into meaningful and causally latent representations. This enables the surrogate dynamics to directly predict causal future states by leveraging a training-free large language model. In two common benchmarks, this novel model outperforms standard world models in the safety prediction task and has a performance comparable to supervised learning despite not using any data. We evaluate its performance with a more specialized and system-relevant metric by comparing estimated states instead of aggregating observation-wide error.
翻译:世界模型通过学习系统的内部动力学模型构建替代世界,用于训练控制器并预测安全违规行为。然而,现有世界模型仅依赖对观测如何随动作变化的统计学习,缺乏对替代动力学模型精度的精确量化,这在安全关键系统中构成重大挑战。为解决该问题,我们提出基础世界模型,该模型将观测嵌入到具有意义且具备因果关系的潜在表征中,从而使得替代动力学模型能够通过利用无需训练的大型语言模型直接预测因果未来状态。在两个通用基准测试中,该新型模型在安全预测任务上优于标准世界模型,且尽管未使用任何数据,其性能仍可与监督学习相媲美。我们通过估计状态而非聚合全局观测误差,采用更专业且与系统相关的指标评估其性能。