Children can rapidly generalize compositionally-constructed rules to unseen test sets. On the other hand, deep reinforcement learning (RL) agents need to be trained over millions of episodes, and their ability to generalize to unseen combinations remains unclear. Hence, we investigate the compositional abilities of RL agents, using the task of navigating to specified color-shape targets in synthetic 3D environments. First, we show that when RL agents are naively trained to navigate to target color-shape combinations, they implicitly learn to decompose the combinations, allowing them to (re-)compose these and succeed at held-out test combinations ("compositional learning"). Second, when agents are pretrained to learn invariant shape and color concepts ("concept learning"), the number of episodes subsequently needed for compositional learning decreased by 20 times. Furthermore, only agents trained on both concept and compositional learning could solve a more complex, out-of-distribution environment in zero-shot fashion. Finally, we verified that only text encoders pretrained on image-text datasets (e.g. CLIP) reduced the number of training episodes needed for our agents to demonstrate compositional learning, and also generalized to 5 unseen colors in zero-shot fashion. Overall, our results are the first to demonstrate that RL agents can be trained to implicitly learn concepts and compositionality, to solve more complex environments in zero-shot fashion.
翻译:儿童能够快速将组合构建的规则泛化到未见过测试集。然而,深度强化学习智能体需要经过数百万次训练,其泛化到未见过组合的能力仍不明确。因此,我们利用在合成3D环境中导航至指定颜色-形状目标的任务,研究了强化学习智能体的组合能力。首先,我们发现当强化学习智能体被简单训练以导航至目标颜色-形状组合时,它们会隐式学习分解这些组合,从而能够(重新)组合这些元素并成功处理保留的测试组合("组合学习")。其次,当智能体经过预训练以学习不变形状和颜色概念("概念学习")后,后续组合学习所需的训练轮次减少了20倍。此外,只有经过概念学习和组合学习训练的智能体才能以零样本方式解决更复杂的分布外环境。最后,我们验证了仅经过图像-文本数据集(如CLIP)预训练的文本编码器能减少智能体展示组合学习所需的训练轮次,并能以零样本方式泛化至5种未见颜色。总体而言,我们的结果首次证明强化学习智能体能够被训练隐式学习概念与组合性,以零样本方式解决更复杂环境。