Complex, long-horizon planning and its combinatorial nature pose steep challenges for learning-based agents. Difficulties in such settings are exacerbated in low data regimes where over-fitting stifles generalization and compounding errors hurt accuracy. In this work, we explore the use of an often unused source of auxiliary supervision: language. Inspired by recent advances in transformer-based models, we train agents with an instruction prediction loss that encourages learning temporally extended representations that operate at a high level of abstraction. Concretely, we demonstrate that instruction modeling significantly improves performance in planning environments when training with a limited number of demonstrations on the BabyAI and Crafter benchmarks. In further analysis we find that instruction modeling is most important for tasks that require complex reasoning, while understandably offering smaller gains in environments that require simple plans. More details and code can be found at https://github.com/jhejna/instruction-prediction.
翻译:复杂的长时域规划及其组合特性对基于学习的智能体提出了严峻挑战。在低数据场景中,过拟合抑制泛化能力,累积误差降低准确率,这些问题进一步加剧了此类情境的难度。本研究探索了一种常被忽视的辅助监督源——语言。受近期基于Transformer模型的进展启发,我们采用指令预测损失训练智能体,该损失函数鼓励学习高抽象层次的时间扩展表征。具体而言,我们在BabyAI和Crafter基准测试中,使用有限数量的示范训练时证明了指令建模能显著提升规划环境中的性能。进一步分析发现,指令建模对需要复杂推理的任务最为关键,而在只需简单规划的场景中增益较小。更多细节与代码请访问 https://github.com/jhejna/instruction-prediction。