Enabling robots to dexterously grasp and manipulate objects based on human commands is a promising direction in robotics. However, existing approaches are challenging to generalize across diverse objects or tasks due to the limited scale of semantic dexterous grasp datasets. Foundation models offer a new way to enhance generalization, yet directly leveraging them to generate feasible robotic actions remains challenging due to the gap between abstract model knowledge and physical robot execution. To address these challenges, we propose OmniDexGrasp, a generalizable framework that achieves omni-capabilities in user prompting, dexterous embodiment, and grasping tasks by combining foundation models with the transfer and control strategies. OmniDexGrasp integrates three key modules: (i) foundation models are used to enhance generalization by generating human grasp images supporting omni-capability of user prompt and task; (ii) a human-image-to-robot-action transfer strategy converts human demonstrations into executable robot actions, enabling omni dexterous embodiment; (iii) force-aware adaptive grasp strategy ensures robust and stable grasp execution. Experiments in simulation and on real robots validate the effectiveness of OmniDexGrasp on diverse user prompts, grasp task and dexterous hands, and further results show its extensibility to dexterous manipulation tasks.
翻译:使机器人能够根据人类指令灵巧地抓取和操纵物体是机器人学中一个前景广阔的方向。然而,由于语义化灵巧抓取数据集的规模有限,现有方法难以在不同物体或任务间实现泛化。基础模型为增强泛化能力提供了新途径,但由于抽象模型知识与物理机器人执行之间存在鸿沟,直接利用它们生成可行的机器人动作仍然具有挑战性。为应对这些挑战,我们提出了OmniDexGrasp,这是一个可泛化的框架,通过将基础模型与迁移和控制策略相结合,实现了在用户指令、灵巧具身和抓取任务方面的全方位能力。OmniDexGrasp集成了三个关键模块:(i) 利用基础模型生成支持全方位用户指令与任务能力的人类抓取图像,以增强泛化性;(ii) 一种从人类图像到机器人动作的迁移策略,将人类示范转化为可执行的机器人动作,实现全方位的灵巧具身;(iii) 力感知自适应抓取策略,确保稳健且稳定的抓取执行。在仿真和真实机器人上的实验验证了OmniDexGrasp在不同用户指令、抓取任务和灵巧手方面的有效性,进一步结果表明其可扩展至灵巧操作任务。