Applications from manipulation to autonomous vehicles rely on robust and general object tracking to safely perform tasks in dynamic environments. We propose the first certifiably optimal category-level approach for simultaneous shape estimation and pose tracking of an object of known category (e.g. a car). Our approach uses 3D semantic keypoint measurements extracted from an RGB-D image sequence, and phrases the estimation as a fixed-lag smoothing problem. Temporal constraints enforce the object's rigidity (fixed shape) and smooth motion according to a constant-twist motion model. The solutions to this problem are the estimates of the object's state (poses, velocities) and shape (paramaterized according to the active shape model) over the smoothing horizon. Our key contribution is to show that despite the non-convexity of the fixed-lag smoothing problem, we can solve it to certifiable optimality using a small-size semidefinite relaxation. We also present a fast outlier rejection scheme that filters out incorrect keypoint detections with shape and time compatibility tests, and wrap our certifiable solver in a graduated non-convexity scheme. We evaluate the proposed approach on synthetic and real data, showcasing its performance in a table-top manipulation scenario and a drone-based vehicle tracking application.
翻译:从机器人操作到自动驾驶车辆等应用,均依赖于鲁棒且通用的目标跟踪技术,以便在动态环境中安全执行任务。我们提出了首个可证明最优的类别级方法,用于对已知类别(例如汽车)的目标同时进行形状估计与姿态跟踪。该方法利用从RGB-D图像序列中提取的3D语义关键点测量值,并将估计问题表述为固定滞后平滑问题。时序约束通过恒定扭转运动模型强制目标保持刚性(固定形状)并实现平滑运动。该问题的解即为在平滑时间窗口内目标状态(姿态、速度)和形状(根据主动形状模型参数化)的估计值。我们的核心贡献在于证明:尽管固定滞后平滑问题具有非凸性,但通过使用小规模半定松弛方法,我们能够以可证明的最优性求解该问题。我们还提出了一种快速的异常值剔除方案,该方案通过形状与时间兼容性测试过滤错误的关鍵点检测,并将我们的可证明求解器嵌入到渐进非凸性框架中。我们在合成数据与真实数据上对所提方法进行了评估,通过在桌面操作场景和基于无人机的车辆跟踪应用中的表现展示了其性能。