In-hand tool manipulation is an operation that not only manipulates a tool within the hand (i.e., in-hand manipulation) but also achieves a grasp suitable for a task after the manipulation. This study aims to achieve an in-hand tool manipulation skill through deep reinforcement learning. The difficulty of learning the skill arises because this manipulation requires (A) exploring long-term contact-state changes to achieve the desired grasp and (B) highly-varied motions depending on the contact-state transition. (A) leads to a sparsity of a reward on a successful grasp, and (B) requires an RL agent to explore widely within the state-action space to learn highly-varied actions, leading to sample inefficiency. To address these issues, this study proposes Action Primitives based on Contact-state Transition (APriCoT). APriCoT decomposes the manipulation into short-term action primitives by describing the operation as a contact-state transition based on three action representations (detach, crossover, attach). In each action primitive, fingers are required to perform short-term and similar actions. By training a policy for each primitive, we can mitigate the issues from (A) and (B). This study focuses on a fundamental operation as an example of in-hand tool manipulation: rotating an elongated object grasped with a precision grasp by half a turn to achieve the initial grasp. Experimental results demonstrated that ours succeeded in both the rotation and the achievement of the desired grasp, unlike existing studies. Additionally, it was found that the policy was robust to changes in object shape.
翻译:手内工具操作是一种不仅在手内操控工具(即手内操作),而且在操作后实现适合任务的抓握方式。本研究旨在通过深度强化学习实现手内工具操作技能。该技能的学习难点在于:(A)需要探索长期的接触状态变化以实现目标抓握;(B)动作高度依赖于接触状态转换而呈现多样性。(A)导致成功抓握的奖励稀疏;(B)要求强化学习智能体在状态-动作空间中进行广泛探索以学习高度变化的动作,从而导致样本效率低下。为解决这些问题,本研究提出了基于接触状态转换的动作基元(APriCoT)。APriCoT通过将操作描述为基于三种动作表征(分离、交叉、附着)的接触状态转换,将操作分解为短期的动作基元。在每个动作基元中,手指需执行短期且相似的动作。通过为每个基元训练策略,我们可以缓解(A)和(B)带来的问题。本研究以一项基础操作作为手内工具操作的示例:将精密抓握的细长物体旋转半周以实现初始抓握。实验结果表明,与现有研究不同,我们的方法成功实现了旋转并达成了目标抓握。此外,还发现该策略对物体形状变化具有鲁棒性。