In this paper, we propose a novel 3D graph convolution based pipeline for category-level 6D pose and size estimation from monocular RGB-D images. The proposed method leverages an efficient 3D data augmentation and a novel vector-based decoupled rotation representation. Specifically, we first design an orientation-aware autoencoder with 3D graph convolution for latent feature learning. The learned latent feature is insensitive to point shift and size thanks to the shift and scale-invariance properties of the 3D graph convolution. Then, to efficiently decode the rotation information from the latent feature, we design a novel flexible vector-based decomposable rotation representation that employs two decoders to complementarily access the rotation information. The proposed rotation representation has two major advantages: 1) decoupled characteristic that makes the rotation estimation easier; 2) flexible length and rotated angle of the vectors allow us to find a more suitable vector representation for specific pose estimation task. Finally, we propose a 3D deformation mechanism to increase the generalization ability of the pipeline. Extensive experiments show that the proposed pipeline achieves state-of-the-art performance on category-level tasks. Further, the experiments demonstrate that the proposed rotation representation is more suitable for the pose estimation tasks than other rotation representations.
翻译:本文提出了一种新颖的基于三维图卷积的管道,用于从单目RGB-D图像进行类别级6D姿态与尺寸估计。该方法利用高效的三维数据增强和一种新型的基于向量的解耦旋转表示。具体而言,我们首先设计了一个具有三维图卷积的朝向感知自编码器用于潜在特征学习。得益于三维图卷积的位移与尺度不变性,所学习的潜在特征对点位移和尺寸不敏感。随后,为从潜在特征中高效解码旋转信息,我们设计了一种新型柔性向量可分解旋转表示,采用两个解码器互补地获取旋转信息。所提旋转表示具有两大优势:1)解耦特性使旋转估计更易实现;2)向量的长度与旋转角度具有灵活性,使我们能为特定姿态估计任务找到更合适的向量表示。最后,我们提出一种三维变形机制以增强管道的泛化能力。大量实验表明,所提管道在类别级任务上达到最先进性能。此外,实验证明该旋转表示比其他旋转表示更适用于姿态估计任务。