World Model, the algorithmic simulator of the real-world environment which biological agents experience and act upon, has been an emerging topic in recent years due to the rising need to develop virtual agents with artificial (general) intelligence. There has been much discussion on what a world model really is, how to build it, how to use it, and how to evaluate it. In this essay, starting from the imagination in the famed Sci-Fi classic Dune, and drawing inspiration from the concept of ``hypothetical thinking'' in psychology literature, we argue the primary goal of a world model to be {\it simulating all actionable possibilities of the real world for purposeful reasoning and acting}. We examine the key design dimensions of world modeling: data, representation, architecture, learning objective, and usage, surveying existing approaches and analyzing their tradeoffs. Building on this examination, we propose a new Generative Latent Prediction (GLP) architecture for a general-purpose world model, based on stateful, hierarchical, multi-level, and mixed continuous/discrete representations, and a generative and self-supervised learning framework, with an outlook of a Physical, Agentic, and Nested (PAN) AGI system enabled by such a model.
翻译:世界模型作为生物体感知并作用于真实环境的算法模拟器,近年来因虚拟智能体(通用人工智能)开发需求的日益增长而成为新兴研究领域。关于世界模型的本体论定义、构建方法、应用范式及评估体系,学界已展开广泛讨论。本文从经典科幻小说《沙丘》中的"预知"概念切入,借鉴心理学文献中的"假设性思维"理论,论证世界模型的核心目标应为"模拟真实世界所有可行动可能性,以支持有目的性的推理与行动"。我们系统考察了世界建模的关键维度:数据、表征、架构、学习目标与应用方式,并对现有方法及其性能折衷进行了全面分析。在此基础上,提出了一种面向通用世界模型的新型生成式潜在预测架构,该架构基于有状态、分层、多层级、连续/离散混合表征,以及生成式自监督学习框架,最终展望了在此类模型支持下构建物理-具身-嵌套型通用人工智能系统的可能性。