Humans naturally employ linguistic instructions to convey knowledge, a process that proves significantly more complex for machines, especially within the context of multitask robotic manipulation environments. Natural language, moreover, serves as the primary medium through which humans acquire new knowledge, presenting a potentially intuitive bridge for translating concepts understandable by humans into formats that can be learned by machines. In pursuit of facilitating this integration, we introduce an explainable behavior cloning agent, named Ex-PERACT, specifically designed for manipulation tasks. This agent is distinguished by its hierarchical structure, which incorporates natural language to enhance the learning process. At the top level, the model is tasked with learning a discrete skill code, while at the bottom level, the policy network translates the problem into a voxelized grid and maps the discretized actions to voxel grids. We evaluate our method across eight challenging manipulation tasks utilizing the RLBench benchmark, demonstrating that Ex-PERACT not only achieves competitive policy performance but also effectively bridges the gap between human instructions and machine execution in complex environments.
翻译:人类自然地运用语言指令来传递知识,这一过程对机器而言则显著复杂得多,尤其是在多任务机器人操作环境中。此外,自然语言是人类获取新知识的主要媒介,为将人类可理解的概念转化为机器可学习的格式提供了一个潜在的直观桥梁。为了促进这种融合,我们引入了一种名为Ex-PERACT的可解释行为克隆智能体,专为操作任务设计。该智能体的特点在于其分层结构,该结构融合了自然语言以增强学习过程。在顶层,模型负责学习离散技能编码;在底层,策略网络将问题转化为体素化网格,并将离散化动作映射到体素网格。我们利用RLBench基准在八项具有挑战性的操作任务上评估了我们的方法,结果表明Ex-PERACT不仅实现了具有竞争力的策略性能,还有效地弥合了复杂环境中人类指令与机器执行之间的鸿沟。