FFHFlow: A Flow-based Variational Approach for Learning Diverse Dexterous Grasps with Shape-Aware Introspection

Synthesizing diverse dexterous grasps from uncertain partial observation is an important yet challenging task for physically intelligent embodiments. Previous works on generative grasp synthesis fell short of precisely capturing the complex grasp distribution and reasoning about shape uncertainty in the unstructured and often partially perceived reality. In this work, we introduce a novel model that can generate diverse grasps for a multi-fingered hand while introspectively handling perceptual uncertainty and recognizing unknown object geometry to avoid performance degradation. Specifically, we devise a Deep Latent Variable Model (DLVM) based on Normalizing Flows (NFs), facilitating hierarchical and expressive latent representation for modeling versatile grasps. Our model design counteracts typical pitfalls of its popular alternative in generative grasping, i.e., conditional Variational Autoencoders (cVAEs) whose performance is limited by mode collapse and miss-specified prior issues. Moreover, the resultant feature hierarchy and the exact flow likelihood computation endow our model with shape-aware introspective capabilities, enabling it to quantify the shape uncertainty of partial point clouds and detect objects of novel geometry. We further achieve performance gain by fusing this information with a discriminative grasp evaluator, facilitating a novel hybrid way for grasp evaluation. Comprehensive simulated and real-world experiments show that the proposed idea gains superior performance and higher run-time efficiency against strong baselines, including diffusion models. We also demonstrate substantial benefits of greater diversity for grasping objects in clutter and a confined workspace in the real world.

翻译：从不确定的部分观测中合成多样化的灵巧抓取，对于物理智能体而言是一项重要且具有挑战性的任务。先前关于生成式抓取合成的研究未能精确捕捉复杂的抓取分布，也未能对非结构化且通常部分感知的现实中的形状不确定性进行推理。在这项工作中，我们引入了一种新颖的模型，该模型能够为多指手生成多样化的抓取，同时内省地处理感知不确定性并识别未知物体几何形状，以避免性能下降。具体而言，我们设计了一种基于归一化流（NFs）的深度隐变量模型（DLVM），为建模多功能抓取提供了层次化和表达力强的隐表示。我们的模型设计克服了生成式抓取领域中流行替代方案（即条件变分自编码器（cVAEs））的典型缺陷，后者的性能受限于模式崩溃和先验分布设定不当的问题。此外，由此产生的特征层次结构和精确的流似然计算赋予我们的模型形状感知的内省能力，使其能够量化部分点云的形状不确定性并检测具有新颖几何形状的物体。我们通过将这些信息与一个判别式抓取评估器相融合，进一步实现了性能提升，从而为抓取评估提供了一种新颖的混合方式。全面的仿真和真实世界实验表明，所提出的方法相较于包括扩散模型在内的强基线，获得了更优的性能和更高的运行时效率。我们还展示了在真实世界中，更大的抓取多样性对于在杂乱和受限工作空间中抓取物体具有显著优势。