Optimal and Robust Category-level Perception: Object Pose and Shape Estimation from 2D and 3D Semantic Keypoints

We consider a category-level perception problem, where one is given 2D or 3D sensor data picturing an object of a given category (e.g., a car), and has to reconstruct the 3D pose and shape of the object despite intra-class variability (i.e., different car models have different shapes). We consider an active shape model, where -for an object category- we are given a library of potential CAD models describing objects in that category, and we adopt a standard formulation where pose and shape are estimated from 2D or 3D keypoints via non-convex optimization. Our first contribution is to develop PACE3D* and PACE2D*, the first certifiably optimal solvers for pose and shape estimation using 3D and 2D keypoints, respectively. Both solvers rely on the design of tight (i.e., exact) semidefinite relaxations. Our second contribution is to develop outlier-robust versions of both solvers, named PACE3D# and PACE2D#. Towards this goal, we propose ROBIN, a general graph-theoretic framework to prune outliers, which uses compatibility hypergraphs to model measurements' compatibility. We show that in category-level perception problems these hypergraphs can be built from the winding orders of the keypoints (in 2D) or their convex hulls (in 3D), and many outliers can be filtered out via maximum hyperclique computation. The last contribution is an extensive experimental evaluation. Besides providing an ablation study on simulated datasets and on the PASCAL3D+ dataset, we combine our solver with a deep keypoint detector, and show that PACE3D# improves over the state of the art in vehicle pose estimation in the ApolloScape datasets, and its runtime is compatible with practical applications. We release our code at https://github.com/MIT-SPARK/PACE.

翻译：本文研究类别级感知问题，即给定描绘某一类别（如汽车）物体的2D或3D传感器数据，需在类内差异（不同车型具有不同形状）条件下重建物体的3D姿态与形状。我们采用主动形状模型：对于每个物体类别，给定描述该类别中潜在物体的CAD模型库，并采用标准框架通过2D或3D关键点的非凸优化估计姿态与形状。本文第一项贡献是分别提出PACE3D*和PACE2D*——首个可验证最优解算器，分别利用3D和2D关键点实现姿态与形状估计。两个解算器均基于紧致（即精确）半定松弛的设计。第二项贡献是开发两个解算器的离群鲁棒版本PACE3D#和PACE2D#。为此，我们提出ROBIN——一种通用的图论框架用于剔除离群值，该框架通过兼容性超图建模测量值的兼容性。研究表明，在类别级感知问题中，这些超图可通过关键点的缠绕顺序（2D）或其凸包（3D）构建，并通过最大超团计算过滤大量离群值。最后贡献是全面的实验评估。除在模拟数据集和PASCAL3D+数据集上进行消融研究外，我们将解算器与深度关键点检测器结合，证明PACE3D#在ApolloScape数据集上改进了车辆姿态估计的现有技术水平，且运行时间满足实际应用需求。代码已开源至https://github.com/MIT-SPARK/PACE。