Deep learning models are increasingly employed for perception, prediction, and control in complex systems. Embedding physical knowledge into these models is crucial for achieving realistic and consistent outputs, a challenge often addressed by physics-informed machine learning. However, integrating physical knowledge with representation learning becomes difficult when dealing with high-dimensional observation data, such as images, particularly under conditions of incomplete or imprecise state information. To address this, we propose Physically Interpretable World Models, a novel architecture that aligns learned latent representations with real-world physical quantities. Our method combines a variational autoencoder with a dynamical model that incorporates unknown system parameters, enabling the discovery of physically meaningful representations. By employing weak supervision with interval-based constraints, our approach eliminates the reliance on ground-truth physical annotations. Experimental results demonstrate that our method improves the quality of learned representations while achieving accurate predictions of future states, advancing the field of representation learning in dynamic systems.
翻译:深度学习模型日益广泛地应用于复杂系统的感知、预测与控制任务中。将物理知识嵌入这些模型对于实现真实且一致的输出至关重要,这一挑战通常通过物理信息机器学习加以解决。然而,当处理高维观测数据(如图像)时,特别是在状态信息不完整或不精确的条件下,将物理知识与表示学习相结合变得尤为困难。为此,我们提出物理可解释世界模型,这是一种将学习到的潜在表示与真实世界物理量对齐的新型架构。我们的方法将变分自编码器与包含未知系统参数的动力学模型相结合,从而能够发现具有物理意义的表示。通过采用基于区间的约束进行弱监督,我们的方法消除了对真实物理标注的依赖。实验结果表明,该方法在实现对未来状态准确预测的同时,提升了学习表示的质量,推动了动态系统中表示学习领域的发展。