Learning fine-grained movements is a challenging topic in robotics, particularly in the context of robotic hands. One specific instance of this challenge is the acquisition of fingerspelling sign language in robots. In this paper, we propose an approach for learning dexterous motor imitation from video examples without additional information. To achieve this, we first build a URDF model of a robotic hand with a single actuator for each joint. We then leverage pre-trained deep vision models to extract the 3D pose of the hand from RGB videos. Next, using state-of-the-art reinforcement learning algorithms for motion imitation (namely, proximal policy optimization and soft actor-critic), we train a policy to reproduce the movement extracted from the demonstrations. We identify the optimal set of hyperparameters for imitation based on a reference motion. Finally, we demonstrate the generalizability of our approach by testing it on six different tasks, corresponding to fingerspelled letters. Our results show that our approach is able to successfully imitate these fine-grained movements without additional information, highlighting its potential for real-world applications in robotics.
翻译:学习精细运动是机器人领域中的一个挑战性课题,尤其是在机械手的背景下。这一挑战的具体实例是机器人指拼手语的习得。在本文中,我们提出了一种无需额外信息即可从视频示例中学习灵巧运动模仿的方法。为实现这一目标,我们首先构建了一个每个关节配备单个执行器的机械手URDF模型。然后,我们利用预训练的深度视觉模型从RGB视频中提取手的3D姿态。接着,使用最先进的运动模仿强化学习算法(即近端策略优化和软演员-评论家)训练一个策略,以复现从示范中提取的运动。我们基于参考运动确定了用于模仿的最优超参数集。最后,通过在六个对应不同指拼字母的任务上进行测试,验证了我们方法的泛化能力。结果表明,我们的方法能够在无需额外信息的情况下成功模仿这些精细运动,突显了其在机器人实际应用中的潜力。