In this work, we propose a novel discriminative framework for dexterous grasp generation, named Dexterous Grasp TRansformer (DGTR), capable of predicting a diverse set of feasible grasp poses by processing the object point cloud with only one forward pass. We formulate dexterous grasp generation as a set prediction task and design a transformer-based grasping model for it. However, we identify that this set prediction paradigm encounters several optimization challenges in the field of dexterous grasping and results in restricted performance. To address these issues, we propose progressive strategies for both the training and testing phases. First, the dynamic-static matching training (DSMT) strategy is presented to enhance the optimization stability during the training phase. Second, we introduce the adversarial-balanced test-time adaptation (AB-TTA) with a pair of adversarial losses to improve grasping quality during the testing phase. Experimental results on the DexGraspNet dataset demonstrate the capability of DGTR to predict dexterous grasp poses with both high quality and diversity. Notably, while keeping high quality, the diversity of grasp poses predicted by DGTR significantly outperforms previous works in multiple metrics without any data pre-processing. Codes are available at https://github.com/iSEE-Laboratory/DGTR .
翻译:本文提出了一种新颖的灵巧抓取生成判别框架,名为灵巧抓取Transformer(DGTR),该框架通过仅一次前向传播处理物体点云,即可预测出多样化的可行抓取姿态。我们将灵巧抓取生成建模为一个集合预测任务,并为此设计了基于Transformer的抓取模型。然而,我们发现在灵巧抓取领域,这种集合预测范式面临若干优化挑战,导致性能受限。为解决这些问题,我们针对训练和测试阶段分别提出了渐进式策略。首先,我们提出了动态-静态匹配训练(DSMT)策略,以增强训练阶段的优化稳定性。其次,我们引入了带有对抗损失对的对抗平衡测试时适应(AB-TTA),以提高测试阶段的抓取质量。在DexGraspNet数据集上的实验结果表明,DGTR能够预测出兼具高质量与多样性的灵巧抓取姿态。值得注意的是,在保持高质量的同时,DGTR预测的抓取姿态多样性的多项指标显著优于先前工作,且无需任何数据预处理。代码已开源至https://github.com/iSEE-Laboratory/DGTR。