Causal dynamics models (CDMs) have demonstrated significant potential in addressing various challenges in reinforcement learning. To learn CDMs, recent studies have performed causal discovery to capture the causal dependencies among environmental variables. However, the learning of CDMs is still confined to small-scale environments due to computational complexity and sample efficiency constraints. This paper aims to extend CDMs to large-scale object-oriented environments, which consist of a multitude of objects classified into different categories. We introduce the Object-Oriented CDM (OOCDM) that shares causalities and parameters among objects belonging to the same class. Furthermore, we propose a learning method for OOCDM that enables it to adapt to a varying number of objects. Experiments on large-scale tasks indicate that OOCDM outperforms existing CDMs in terms of causal discovery, prediction accuracy, generalization, and computational efficiency.
翻译:因果动力学模型在解决强化学习中的多种挑战方面展现了巨大潜力。为学习因果动力学模型,近期研究通过因果发现来捕获环境变量间的因果依赖关系。然而,由于计算复杂度和样本效率的限制,因果动力学模型的学习仍局限于小规模环境。本文旨在将因果动力学模型扩展至大规模面向对象环境——此类环境包含大量归属于不同类别的对象。我们提出面向对象因果动力学模型,该模型在同一类别的对象间共享因果关系与参数。此外,我们提出面向对象因果动力学模型的学习方法,使其能够适应不同数量的对象。在大规模任务上的实验表明,面向对象因果动力学模型在因果发现、预测精度、泛化能力和计算效率方面均优于现有因果动力学模型。