Object Pose Estimation with Statistical Guarantees: Conformal Keypoint Detection and Geometric Uncertainty Propagation

The two-stage object pose estimation paradigm first detects semantic keypoints on the image and then estimates the 6D pose by minimizing reprojection errors. Despite performing well on standard benchmarks, existing techniques offer no provable guarantees on the quality and uncertainty of the estimation. In this paper, we inject two fundamental changes, namely conformal keypoint detection and geometric uncertainty propagation, into the two-stage paradigm and propose the first pose estimator that endows an estimation with provable and computable worst-case error bounds. On one hand, conformal keypoint detection applies the statistical machinery of inductive conformal prediction to convert heuristic keypoint detections into circular or elliptical prediction sets that cover the groundtruth keypoints with a user-specified marginal probability (e.g., 90%). Geometric uncertainty propagation, on the other, propagates the geometric constraints on the keypoints to the 6D object pose, leading to a Pose UnceRtainty SEt (PURSE) that guarantees coverage of the groundtruth pose with the same probability. The PURSE, however, is a nonconvex set that does not directly lead to estimated poses and uncertainties. Therefore, we develop RANdom SAmple averaGing (RANSAG) to compute an average pose and apply semidefinite relaxation to upper bound the worst-case errors between the average pose and the groundtruth. On the LineMOD Occlusion dataset we demonstrate: (i) the PURSE covers the groundtruth with valid probabilities; (ii) the worst-case error bounds provide correct uncertainty quantification; and (iii) the average pose achieves better or similar accuracy as representative methods based on sparse keypoints.

翻译：两阶段目标姿态估计范式首先在图像上检测语义关键点，然后通过最小化重投影误差来估计6D姿态。尽管在标准基准测试上表现良好，现有技术无法为估计质量和不确定性提供可证明的保证。本文在两阶段范式中引入两项根本性变革——一致性关键点检测与几何不确定性传播，并首次提出一种能够为估计提供可证明且可计算的误差上界的姿态估计器。一方面，一致性关键点检测应用归纳一致性预测的统计机制，将启发式关键点检测转化为圆形或椭圆形预测集，这些集合以用户指定的边际概率（例如90%）覆盖真实关键点。另一方面，几何不确定性传播将关键点上的几何约束传播至6D目标姿态，生成姿态不确定性集合（PURSE），该集合以相同概率保证覆盖真实姿态。然而，PURSE是非凸集，无法直接导出估计姿态与不确定性。为此，我们开发了随机平均采样（RANSAG）以计算平均姿态，并应用半定松弛来界定平均姿态与真实姿态之间的最差情况误差上界。在LineMOD遮挡数据集上的实验表明：（i）PURSE以有效概率覆盖真实姿态；（ii）最差情况误差上界提供了正确的不确定性量化；（iii）平均姿态相比基于稀疏关键点的代表性方法具有更好或类似的精度。