Estimating 6D object poses is a major challenge in 3D computer vision. Building on successful instance-level approaches, research is shifting towards category-level pose estimation for practical applications. Current category-level datasets, however, fall short in annotation quality and pose variety. Addressing this, we introduce HouseCat6D, a new category-level 6D pose dataset. It features 1) multi-modality with Polarimetric RGB and Depth (RGBD+P), 2) encompasses 194 diverse objects across 10 household categories, including two photometrically challenging ones, and 3) provides high-quality pose annotations with an error range of only 1.35 mm to 1.74 mm. The dataset also includes 4) 41 large-scale scenes with comprehensive viewpoint and occlusion coverage, 5) a checkerboard-free environment, and 6) dense 6D parallel-jaw robotic grasp annotations. Additionally, we present benchmark results for leading category-level pose estimation networks.
翻译:估计物体6D姿态是三维计算机视觉中的一项重大挑战。在成功的实例级方法基础上,研究正转向面向实际应用的类别级姿态估计。然而,现有类别级数据集在标注质量和姿态多样性方面存在不足。针对这一问题,我们提出了HouseCat6D,一种新的类别级6D姿态数据集。该数据集具有以下特点:1) 采用偏振光RGB与深度(RGBD+P)的多模态形式;2) 涵盖10个家庭类别的194个多样化物体,包括两个具有光度复杂性的类别;3) 提供误差范围仅为1.35毫米至1.74毫米的高质量姿态标注。此外,该数据集还包含4) 41个具有全面视角和遮挡覆盖的大规模场景,5) 无棋盘格环境,以及6) 密集的6D平行夹爪机器人抓取标注。同时,我们展示了面向领先类别级姿态估计网络的基准测试结果。