How can agents learn internal models that veridically represent interactions with the real world is a largely open question. As machine learning is moving towards representations containing not just observational but also interventional knowledge, we study this problem using tools from representation learning and group theory. We propose methods enabling an agent acting upon the world to learn internal representations of sensory information that are consistent with actions that modify it. We use an autoencoder equipped with a group representation acting on its latent space, trained using an equivariance-derived loss in order to enforce a suitable homomorphism property on the group representation. In contrast to existing work, our approach does not require prior knowledge of the group and does not restrict the set of actions the agent can perform. We motivate our method theoretically, and show empirically that it can learn a group representation of the actions, thereby capturing the structure of the set of transformations applied to the environment. We further show that this allows agents to predict the effect of sequences of future actions with improved accuracy.
翻译:智能体如何学习能准确表征与真实世界交互的内部模型,仍是一个基本开放问题。随着机器学习正从仅包含观测知识转向同时包含干预知识的表示,我们利用表示学习和群论的工具对此问题展开研究。我们提出的方法能使作用于世界的智能体学习到与修正世界的动作相一致的感官信息内部表示。我们采用配备有作用于潜空间上的群表示的自动编码器,通过基于等变性推导的损失函数进行训练,以强制群表示满足合适的同态性质。与现有工作相比,我们的方法无需先验已知的群结构,也不限制智能体可执行的动作集。我们从理论上论证了该方法的合理性,并通过实验证明其能学习动作的群表示,从而捕捉应用于环境的变换集的结构。进一步研究表明,这能使智能体更精确地预测未来动作序列的效果。