Visual-inertial initialization can be classified into joint and disjoint approaches. Joint approaches tackle both the visual and the inertial parameters together by aligning observations from feature-bearing points based on IMU integration then use a closed-form solution with visual and acceleration observations to find initial velocity and gravity. In contrast, disjoint approaches independently solve the Structure from Motion (SFM) problem and determine inertial parameters from up-to-scale camera poses obtained from pure monocular SLAM. However, previous disjoint methods have limitations, like assuming negligible acceleration bias impact or accurate rotation estimation by pure monocular SLAM. To address these issues, we propose EDI, a novel approach for fast, accurate, and robust visual-inertial initialization. Our method incorporates an Error-state Kalman Filter (ESKF) to estimate gyroscope bias and correct rotation estimates from monocular SLAM, overcoming dependence on pure monocular SLAM for rotation estimation. To estimate the scale factor without prior information, we offer a closed-form solution for initial velocity, scale, gravity, and acceleration bias estimation. To address gravity and acceleration bias coupling, we introduce weights in the linear least-squares equations, ensuring acceleration bias observability and handling outliers. Extensive evaluation on the EuRoC dataset shows that our method achieves an average scale error of 5.8% in less than 3 seconds, outperforming other state-of-the-art disjoint visual-inertial initialization approaches, even in challenging environments and with artificial noise corruption.
翻译:摘要:视觉-惯性初始化可分为联合方法与解耦方法两类。联合方法通过基于IMU积分对齐特征观测点,再结合视觉与加速度观测的闭式解求解初始速度和重力,从而联合处理视觉与惯性参数。而解耦方法则独立求解运动恢复结构(SFM)问题,并从纯单目SLAM生成的比例尺度未知的相机位姿中确定惯性参数。然而,现有解耦方法存在局限性,例如假设加速度偏差影响可忽略,或依赖纯单目SLAM提供精确的旋转估计。为解决这些问题,我们提出EDI——一种快速、精确且鲁棒的视觉-惯性初始化新方法。该方法引入误差状态卡尔曼滤波器(ESKF)估计陀螺仪偏差并修正单目SLAM旋转估计值,从而摆脱对纯单目SLAM旋转估计的依赖。为在无先验信息条件下估计尺度因子,我们提出用于初始速度、尺度、重力及加速度偏差估计的闭式解。针对重力与加速度偏差耦合问题,我们在线性最小二乘方程中引入权重,确保加速度偏差的可观测性并处理异常值。在EuRoC数据集上的全面评估表明,本方法在不到3秒内实现平均尺度误差5.8%,在挑战性环境及人工噪声干扰下仍优于其他先进解耦视觉-惯性初始方法。