Object shape and pose estimation is a foundational robotics problem, supporting tasks from manipulation to scene understanding and navigation. We present a fast local solver for shape and pose estimation which requires only category-level object priors and admits an efficient certificate of global optimality. Given an RGB-D image of an object, we use a learned front-end to detect sparse, category-level semantic keypoints on the target object. We represent the target object's unknown shape using a linear active shape model and pose a maximum a posteriori optimization problem to solve for position, orientation, and shape simultaneously. Expressed in unit quaternions, this problem admits first-order optimality conditions in the form of an eigenvalue problem with eigenvector nonlinearities. Our primary contribution is to solve this problem efficiently with self-consistent field iteration, which only requires computing a 4-by-4 matrix and finding its minimum eigenvalue-vector pair at each iterate. Solving a linear system for the corresponding Lagrange multipliers gives a simple global optimality certificate. One iteration of our solver runs in about 100 microseconds, enabling fast outlier rejection. We test our method on synthetic data and a variety of real-world settings, including two public datasets and a drone tracking scenario. Code is released at https://github.com/MIT-SPARK/Fast-ShapeAndPose.
翻译:物体形状与姿态估计是机器人学的基础问题,支撑着从操作到场景理解与导航等多种任务。本文提出一种快速的局部求解器,用于形状与姿态估计,该方法仅需类别级物体先验,并能提供高效的全局最优性证明。给定物体的RGB-D图像,我们使用学习式前端检测目标物体上稀疏的类别级语义关键点。通过线性主动形状模型表示目标物体的未知形状,并构建最大后验优化问题,以同时求解位置、朝向与形状。该问题在单位四元数表示下,其一级最优性条件可表述为具有特征向量非线性的特征值问题。我们的核心贡献是采用自洽场迭代高效求解此问题,该方法在每次迭代中仅需计算一个4×4矩阵并寻找其最小特征值-向量对。通过求解对应拉格朗日乘子的线性系统,可获得简洁的全局最优性证明。本求解器的单次迭代运行时间约为100微秒,可实现快速异常值剔除。我们在合成数据及多种真实场景中测试了该方法,包括两个公开数据集和无人机追踪场景。代码发布于https://github.com/MIT-SPARK/Fast-ShapeAndPose。