Optimal and Robust Category-level Perception: Object Pose and Shape Estimation from 2D and 3D Semantic Keypoints

We consider a category-level perception problem, where one is given 2D or 3D sensor data picturing an object of a given category (e.g., a car), and has to reconstruct the 3D pose and shape of the object despite intra-class variability (i.e., different car models have different shapes). We consider an active shape model, where -- for an object category -- we are given a library of potential CAD models describing objects in that category, and we adopt a standard formulation where pose and shape are estimated from 2D or 3D keypoints via non-convex optimization. Our first contribution is to develop PACE3D* and PACE2D*, the first certifiably optimal solvers for pose and shape estimation using 3D and 2D keypoints, respectively. Both solvers rely on the design of tight (i.e., exact) semidefinite relaxations. Our second contribution is to develop outlier-robust versions of both solvers, named PACE3D# and PACE2D#. Towards this goal, we propose ROBIN, a general graph-theoretic framework to prune outliers, which uses compatibility hypergraphs to model measurements' compatibility. We show that in category-level perception problems these hypergraphs can be built from the winding orders of the keypoints (in 2D) or their convex hulls (in 3D), and many outliers can be filtered out via maximum hyperclique computation. The last contribution is an extensive experimental evaluation. Besides providing an ablation study on simulated datasets and on the PASCAL3D+ dataset, we combine our solver with a deep keypoint detector, and show that PACE3D# improves over the state of the art in vehicle pose estimation in the ApolloScape datasets, and its runtime is compatible with practical applications. We release our code at https://github.com/MIT-SPARK/PACE.

翻译：本文研究类别级感知问题：给定描绘某类别物体（例如汽车）的2D或3D传感器数据，需在类内差异（即不同车型具有不同形状）存在的情况下重建物体的3D姿态与形状。采用主动形状模型——针对物体类别，我们拥有描述该类别物体的潜在CAD模型库，并采用标准框架，通过非凸优化从2D或3D关键点估计姿态与形状。首要贡献是开发PACE3D*与PACE2D*，这是首个分别利用3D和2D关键点实现姿态与形状估计的可证明最优求解器。两种求解器均依赖于紧致（即精确）半定松弛的设计。第二个贡献是开发了两种求解器的离群点鲁棒版本——PACE3D#与PACE2D#。为此，我们提出ROBIN：一种基于图论剔除离群点的通用框架，通过兼容性超图对测量值的兼容性进行建模。研究表明，在类别级感知问题中，可通过关键点的缠绕顺序（2D）或凸包（3D）构建此类超图，并利用最大超团计算过滤大量离群点。最后，我们开展广泛实验评估：除在模拟数据集与PASCAL3D+数据集上进行消融研究外，还将求解器与深度关键点检测器结合，证实PACE3D#在ApolloScape数据集车辆姿态估计中超越现有最优方法，且其运行时间满足实际应用需求。相关代码已开源至https://github.com/MIT-SPARK/PACE。