World models provide a powerful framework for simulating environment dynamics conditioned on actions or instructions, enabling downstream tasks such as action planning or policy learning. Recent approaches leverage world models as learned simulators, but its application to decision-time planning remains computationally prohibitive for real-time control. A key bottleneck lies in latent representations: conventional tokenizers encode each observation into hundreds of tokens, making planning both slow and resource-intensive. To address this, we propose CompACT, a discrete tokenizer that compresses each observation into as few as 8 tokens, drastically reducing computational cost while preserving essential information for planning. An action-conditioned world model that occupies CompACT tokenizer achieves competitive planning performance with orders-of-magnitude faster planning, offering a practical step toward real-world deployment of world models.
翻译:世界模型为基于动作或指令的环境动态模拟提供了强大框架,能够支持动作规划或策略学习等下游任务。现有方法将世界模型作为学习型模拟器使用,但其在决策时规划中的应用仍因计算成本过高而难以实现实时控制。关键瓶颈在于潜在表示:传统分词器将每个观测编码为数百个令牌,导致规划过程既缓慢又消耗大量资源。为解决这一问题,我们提出CompACT——一种将每个观测压缩至仅8个令牌的离散分词器,在保持规划所需关键信息的同时显著降低计算成本。采用CompACT分词器的动作条件世界模型实现了具有竞争力的规划性能,且规划速度提升数个数量级,为世界模型在现实场景中的实际部署迈出了关键一步。