What makes generalization hard for imitation learning in visual robotic manipulation? This question is difficult to approach at face value, but the environment from the perspective of a robot can often be decomposed into enumerable factors of variation, such as the lighting conditions or the placement of the camera. Empirically, generalization to some of these factors have presented a greater obstacle than others, but existing work sheds little light on precisely how much each factor contributes to the generalization gap. Towards an answer to this question, we study imitation learning policies in simulation and on a real robot language-conditioned manipulation task to quantify the difficulty of generalization to different (sets of) factors. We also design a new simulated benchmark of 19 tasks with 11 factors of variation to facilitate more controlled evaluations of generalization. From our study, we determine an ordering of factors based on generalization difficulty, that is consistent across simulation and our real robot setup.
翻译:是什么使得模仿学习在视觉机器人操作中的泛化变得困难?这一问题若直接探讨颇为棘手,但机器人视角下的环境往往可分解为可枚举的变化因素,例如光照条件或摄像头位置。实验表明,某些因素的泛化难度显著高于其他因素,然而现有研究未能清晰阐明各因素对泛化差距的具体贡献程度。为回答这一问题,我们通过仿真环境与真实机器人语言条件操作任务中的模仿学习策略研究,量化了不同(组)因素的泛化难度。此外,我们设计了一个包含19项任务与11个变化因素的新型仿真基准,以支持对泛化能力的更受控评估。基于研究结果,我们依据泛化难度确定了各因素的排序,该排序在仿真环境与真实机器人实验中保持一致性。