视觉逆强化学习在类人机器人操作中的应用 (Visual IRL for Human-Like Robotic Manipulation)

We present a novel method for collaborative robots (cobots) to learn manipulation tasks and perform them in a human-like manner. Our method falls under the learn-from-observation (LfO) paradigm, where robots learn to perform tasks by observing human actions, which facilitates quicker integration into industrial settings compared to programming from scratch. We introduce Visual IRL that uses the RGB-D keypoints in each frame of the observed human task performance directly as state features, which are input to inverse reinforcement learning (IRL). The inversely learned reward function, which maps keypoints to reward values, is transferred from the human to the cobot using a novel neuro-symbolic dynamics model, which maps human kinematics to the cobot arm. This model allows similar end-effector positioning while minimizing joint adjustments, aiming to preserve the natural dynamics of human motion in robotic manipulation. In contrast with previous techniques that focus on end-effector placement only, our method maps multiple joint angles of the human arm to the corresponding cobot joints. Moreover, it uses an inverse kinematics model to then minimally adjust the joint angles, for accurate end-effector positioning. We evaluate the performance of this approach on two different realistic manipulation tasks. The first task is produce processing, which involves picking, inspecting, and placing onions based on whether they are blemished. The second task is liquid pouring, where the robot picks up bottles, pours the contents into designated containers, and disposes of the empty bottles. Our results demonstrate advances in human-like robotic manipulation, leading to more human-robot compatibility in manufacturing applications.

翻译：我们提出了一种新颖的方法，使协作机器人能够学习操作任务并以类人的方式执行。该方法属于从观察中学习的范式，机器人通过观察人类动作来学习执行任务，相较于从零开始编程，这有助于其在工业环境中更快地集成。我们引入了视觉逆强化学习方法，该方法将观察到的任务执行过程中每一帧的RGB-D关键点直接用作状态特征，并输入到逆强化学习中。通过逆学习得到的奖励函数将关键点映射为奖励值，并利用一种新颖的神经符号动力学模型从人类迁移到协作机器人，该模型将人体运动学映射到协作机器人手臂。该模型旨在实现相似的末端执行器定位，同时最小化关节调整，力求在机器人操作中保留人类运动的自然动力学特性。与以往仅关注末端执行器放置的技术相比，我们的方法将人类手臂的多个关节角度映射到协作机器人的相应关节。此外，它利用逆运动学模型对关节角度进行最小调整，以实现精确的末端执行器定位。我们在两种不同的现实操作任务上评估了该方法的性能。第一个任务是农产品处理，涉及根据洋葱是否有瑕疵进行抓取、检查和放置。第二个任务是液体倾倒，机器人需要拾取瓶子，将内容物倒入指定容器，并处理空瓶。我们的研究结果表明，该方法在类人机器人操作方面取得了进展，从而在制造应用中实现了更高的人机兼容性。