Estimating camera motion from monocular video is a fundamental problem in computer vision, central to tasks such as SLAM, visual odometry, and structure-from-motion. Existing methods that recover the camera's heading under known rotation, whether from an IMU or an optimization algorithm, tend to perform well in low-noise, low-outlier conditions, but often decrease in accuracy or become computationally expensive as noise and outlier levels increase. To address these limitations, we propose a novel generalization of the Hough transform on the unit sphere (S(2)) to estimate the camera's heading. First, the method extracts correspondences between two frames and generates a great circle of directions compatible with each pair of correspondences. Then, by discretizing the unit sphere using a Fibonacci lattice as bin centers, each great circle casts votes for a range of directions, ensuring that features unaffected by noise or dynamic objects vote consistently for the correct motion direction. Experimental results on three datasets demonstrate that the proposed method is on the Pareto frontier of accuracy versus efficiency. Additionally, experiments on SLAM show that the proposed method reduces RMSE by correcting the heading during camera pose initialization.
翻译:从单目视频中估计相机运动是计算机视觉中的一个基本问题,对于SLAM、视觉里程计和运动恢复结构等任务至关重要。现有方法在已知旋转(无论是来自IMU还是优化算法)下恢复相机航向,通常在低噪声、低异常值条件下表现良好,但随着噪声和异常值水平的增加,其精度往往会下降或计算成本变得高昂。为应对这些局限性,我们提出了一种在单位球面(S(2))上霍夫变换的新颖推广方法,用于估计相机航向。首先,该方法提取两帧之间的对应关系,并为每对对应关系生成一个与之兼容的大圆方向集。接着,通过使用斐波那契格网作为分箱中心对单位球面进行离散化,每个大圆对一系列方向进行投票,确保未受噪声或动态物体影响的特征对正确的运动方向进行一致投票。在三个数据集上的实验结果表明,所提方法在精度与效率方面均处于帕累托前沿。此外,在SLAM上的实验表明,该方法通过在相机位姿初始化阶段校正航向,有效降低了均方根误差。