Historically, feature-based approaches have been used extensively for camera-based robot perception tasks such as localization, mapping, tracking, and others. Several of these approaches also combine other sensors (inertial sensing, for example) to perform combined state estimation. Our work rethinks this approach; we present a representation learning mechanism that identifies visual features that best correspond to robot motion as estimated by an external signal. Specifically, we utilize the robot's transformations through an external signal (inertial sensing, for example) and give attention to image space that is most consistent with the external signal. We use a pairwise consistency metric as a representation to keep the visual features consistent through a sequence with the robot's relative pose transformations. This approach enables us to incorporate information from the robot's perspective instead of solely relying on the image attributes. We evaluate our approach on real-world datasets such as KITTI & EuRoC and compare the refined features with existing feature descriptors. We also evaluate our method using our real robot experiment. We notice an average of 49% reduction in the image search space without compromising the trajectory estimation accuracy. Our method reduces the execution time of visual odometry by 4.3% and also reduces reprojection errors. We demonstrate the need to select only the most important features and show the competitiveness using various feature detection baselines.
翻译:历史上,基于特征的方法已广泛用于基于相机的机器人感知任务,如定位、建图、跟踪等。其中一些方法还结合其他传感器(例如惯性传感)来执行联合状态估计。我们的工作重新思考了这一方法;我们提出了一种表征学习机制,用于识别与外部信号估计的机器人运动最匹配的视觉特征。具体而言,我们利用机器人通过外部信号(例如惯性传感)获得的变换,并关注与外部信号最一致的图像空间。我们采用成对一致性度量作为表征,以在序列中使视觉特征与机器人的相对位姿变换保持一致。这种方法使我们能够从机器人视角整合信息,而不仅仅依赖图像属性。我们在真实世界数据集(如KITTI和EuRoC)上评估了我们的方法,并将优化后的特征与现有特征描述符进行比较。我们还通过真实机器人实验验证了我们的方法。我们发现,在不影响轨迹估计精度的前提下,图像搜索空间平均减少了49%。我们的方法将视觉里程计的执行时间减少了4.3%,并降低了重投影误差。我们论证了仅选择最重要特征的必要性,并通过多种特征检测基准展示了方法的竞争力。