We introduce DexGanGrasp, a dexterous grasping synthesis method that generates and evaluates grasps with single view in real time. DexGanGrasp comprises a Conditional Generative Adversarial Networks (cGANs)-based DexGenerator to generate dexterous grasps and a discriminator-like DexEvalautor to assess the stability of these grasps. Extensive simulation and real-world expriments showcases the effectiveness of our proposed method, outperforming the baseline FFHNet with an 18.57% higher success rate in real-world evaluation. We further extend DexGanGrasp to DexAfford-Prompt, an open-vocabulary affordance grounding pipeline for dexterous grasping leveraging Multimodal Large Language Models (MLLMs) and Vision Language Models (VLMs), to achieve task-oriented grasping with successful real-world deployments.
翻译:本文提出DexGanGrasp,一种能够实时生成并评估单视角多指抓取姿态的方法。该方法包含基于条件生成对抗网络(cGANs)的DexGenerator用于生成灵巧抓取姿态,以及类似判别器的DexEvaluator用于评估这些抓取姿态的稳定性。大量仿真与真实世界实验验证了所提方法的有效性,在真实评估中成功率较基线FFHNet提升18.57%。我们进一步将DexGanGrasp扩展为DexAfford-Prompt——一个利用多模态大语言模型(MLLMs)与视觉语言模型(VLMs)实现开放词汇功能接地的多指抓取流程,从而实现了面向任务的抓取操作,并成功完成了真实场景部署。