蒙眼专家泛化能力更强：来自机器人操作与电子游戏的启示 (Blindfolded Experts Generalize Better: Insights from Robotic Manipulation and Videogames)

Behavioral cloning is a simple yet effective technique for learning sequential decision-making from demonstrations. Recently, it has gained prominence as the core of foundation models for the physical world, where achieving generalization requires countless demonstrations of a multitude of tasks. Typically, a human expert with full information on the task demonstrates a (nearly) optimal behavior. In this paper, we propose to hide some of the task's information from the demonstrator. This ``blindfolded'' expert is compelled to employ non-trivial exploration to solve the task. We show that cloning the blindfolded expert generalizes better to unseen tasks than its fully-informed counterpart. We conduct experiments of real-world robot peg insertion tasks with (limited) human demonstrations, alongside videogames from the Procgen benchmark. Additionally, we support our findings with theoretical analysis, which confirms that the generalization error scales with $\sqrt{I/m}$, where $I$ measures the amount of task information available to the demonstrator, and $m$ is the number of demonstrated tasks. Both theory and practice indicate that cloning blindfolded experts generalizes better with fewer demonstrated tasks. Project page with videos and code: https://sites.google.com/view/blindfoldedexperts/home

翻译：行为克隆是一种从演示中学习序列决策的简单而有效的技术。近年来，它已成为物理世界基础模型的核心，其中实现泛化需要海量任务的无尽演示。通常，拥有任务完整信息的人类专家会展示（近乎）最优的行为。在本文中，我们提出向演示者隐藏部分任务信息。这种“蒙眼”专家被迫采用非平凡的探索来解决任务。我们证明，克隆蒙眼专家比克隆完全知情专家在未见任务上具有更好的泛化能力。我们在真实世界机器人插孔任务（基于有限的人类演示）以及Procgen基准的电子游戏中进行了实验。此外，我们通过理论分析支持了这些发现，该分析证实泛化误差随$\sqrt{I/m}$缩放，其中$I$衡量演示者可获取的任务信息量，$m$为演示任务的数量。理论与实践均表明，克隆蒙眼专家能以更少的演示任务实现更好的泛化。项目页面含视频与代码：https://sites.google.com/view/blindfoldedexperts/home