In this work, we present a lightweight, tightly-coupled deep depth network and visual-inertial odometry (VIO) system, which can provide accurate state estimates and dense depth maps of the immediate surroundings. Leveraging the proposed lightweight Conditional Variational Autoencoder (CVAE) for depth inference and encoding, we provide the network with previously marginalized sparse features from VIO to increase the accuracy of initial depth prediction and generalization capability. The compact encoded depth maps are then updated jointly with navigation states in a sliding window estimator in order to provide the dense local scene geometry. We additionally propose a novel method to obtain the CVAE's Jacobian which is shown to be more than an order of magnitude faster than previous works, and we additionally leverage First-Estimate Jacobian (FEJ) to avoid recalculation. As opposed to previous works relying on completely dense residuals, we propose to only provide sparse measurements to update the depth code and show through careful experimentation that our choice of sparse measurements and FEJs can still significantly improve the estimated depth maps. Our full system also exhibits state-of-the-art pose estimation accuracy, and we show that it can run in real-time with single-thread execution while utilizing GPU acceleration only for the network and code Jacobian.
翻译:本文提出了一种轻量级、紧耦合的深度网络与视觉-惯性里程计(VIO)融合系统,能够提供精确的状态估计和周边环境的稠密深度图。通过利用所提出的轻量级条件变分自编码器(CVAE)进行深度推断与编码,我们将先前从VIO中边缘化的稀疏特征输入网络,以提升初始深度预测的精度与泛化能力。随后,紧凑的编码深度图与导航状态在滑动窗口估计器中联合更新,以提供稠密的局部场景几何结构。我们进一步提出了一种获取CVAE雅可比矩阵的新方法,该方法相比先前工作快一个数量级以上,并采用首次估计雅可比(FEJ)以避免重复计算。不同于先前依赖完全稠密残差的方法,我们仅通过稀疏测量值更新深度编码,并通过严谨实验表明,稀疏测量与FEJ的选择仍能显著改善估计深度图。完整系统在姿态估计精度上达到当前最优水平,且可在单线程实时运行中,仅将网络与编码雅可比计算交由GPU加速。