Dexterous manipulation requires planning a grasp configuration suited to the object and task, which is then executed through coordinated multi-finger control. However, specifying grasp plans with dense pose or contact targets for every object and task is impractical. Meanwhile, end-to-end reinforcement learning from task rewards alone lacks controllability, making it difficult for users to intervene when failures occur. To this end, we present GRIT, a two-stage framework that learns dexterous control from sparse taxonomy guidance. GRIT first predicts a taxonomy-based grasp specification from the scene and task context. Conditioned on this sparse command, a policy generates continuous finger motions that accomplish the task while preserving the intended grasp structure. Our result shows that certain grasp taxonomies are more effective for specific object geometries. By leveraging this relationship, GRIT improves generalization to novel objects over baselines and achieves an overall success rate of 87.9%. Moreover, real-world experiments demonstrate controllability, enabling grasp strategies to be adjusted through high-level taxonomy selection based on object geometry and task intent.
翻译:灵巧操作需要根据物体和任务规划适合的抓取构型,并通过协调的多指控制来执行。然而,为每个物体和任务指定带有密集姿态或接触目标的抓取计划并不现实。同时,仅依赖任务奖励的端到端强化学习缺乏可控性,导致用户难以在失败时进行干预。为此,我们提出了GRIT框架,这是一种从稀疏分类学引导中学习灵巧控制的两阶段框架。GRIT首先根据场景和任务上下文预测基于分类学的抓取规范。在此稀疏指令的条件下,一个策略生成连续的指部运动,在完成目标任务的同时保持预期的抓取结构。我们的结果表明,某些抓取分类学对特定物体几何形状更为有效。通过利用这种关系,GRIT相比基线方法提高了对新颖物体的泛化能力,并实现了87.9%的总体成功率。此外,真实世界实验验证了其可控性,能够通过基于物体几何和任务意图的高层分类学选择来调整抓取策略。