World models aim to capture the states and dynamics of an environment in a compact latent space. Moreover, using Boolean state representations is particularly useful for search heuristics and symbolic reasoning and planning. Existing approaches keep latents informative via decoder-based reconstruction, or instead via contrastive or reward signals. In this work, we introduce Discrete World Models via Regularization (DWMR): a reconstruction-free and contrastive-free method for unsupervised Boolean world-model learning. In particular, we introduce a novel world-modeling loss that couples latent prediction with specialized regularizers. Such regularizers maximize the entropy and independence of the representation bits through variance, correlation, and coskewness penalties, while simultaneously enforcing a locality prior for sparse action changes. To enable effective optimization, we also introduce a novel training scheme improving robustness to discrete roll-outs. Experiments on two benchmarks with underlying combinatorial structure show that DWMR learns more accurate representations and transitions than reconstruction-based alternatives. Finally, DWMR can also be paired with an auxiliary reconstruction decoder, and this combination yields additional gains.
翻译:世界模型旨在将环境的状态与动态特性捕捉到一个紧凑的潜在空间中。此外,使用布尔状态表示对于搜索启发式、符号推理与规划尤为有用。现有方法通过基于解码器的重构,或通过对比或奖励信号来保持潜在信息的丰富性。在本工作中,我们提出了基于正则化的离散世界模型:一种无需重构且无需对比的无监督布尔世界模型学习方法。具体而言,我们引入了一种新颖的世界建模损失函数,该函数将潜在预测与专门设计的正则化器相结合。这些正则化器通过方差、相关性和共偏度惩罚来最大化表示比特的熵与独立性,同时施加稀疏动作变化的局部性先验。为了实现有效优化,我们还引入了一种新颖的训练方案,以提升对离散展开过程的鲁棒性。在两个具有底层组合结构的基准测试上的实验表明,DWMR 学习到的表示与状态转移比基于重构的替代方法更为准确。最后,DWMR 也可与辅助的重构解码器结合使用,这种组合能带来额外的性能提升。