This paper presents a robust approach for a visual parallel tracking and mapping (PTAM) system that excels in challenging environments. Our proposed method combines the strengths of heterogeneous multi-modal visual sensors, including stereo event-based and frame-based sensors, in a unified reference frame through a novel spatio-temporal synchronization of stereo visual frames and stereo event streams. We employ deep learning-based feature extraction and description for estimation to enhance robustness further. We also introduce an end-to-end parallel tracking and mapping optimization layer complemented by a simple loop-closure algorithm for efficient SLAM behavior. Through comprehensive experiments on both small-scale and large-scale real-world sequences of VECtor and TUM-VIE benchmarks, our proposed method (DH-PTAM) demonstrates superior performance in terms of robustness and accuracy in adverse conditions, especially in large-scale HDR scenarios. Our implementation's research-based Python API is publicly available on GitHub for further research and development: https://github.com/AbanobSoliman/DH-PTAM.
翻译:本文提出了一种鲁棒的视觉并行跟踪与建图(PTAM)方法,能在具有挑战性的环境中表现出色。所提方法通过新颖的立体视觉帧与立体事件流的时空同步,将异构多模态视觉传感器(包括基于立体事件和基于帧的传感器)的优势融合于统一的参考系中。为增强鲁棒性,我们采用基于深度学习的特征提取与描述进行估计。此外,我们引入了一个端到端的并行跟踪与建图优化层,并辅以简单的闭环算法以实现高效的SLAM行为。通过在VECtor和TUM-VIE基准测试的小规模与大规模真实世界序列上的全面实验,所提方法(DH-PTAM)在恶劣条件下(尤其在大规模高动态范围场景中)展现了优越的鲁棒性与精度。我们基于研究的Python API实现已在GitHub上公开,以促进进一步研究与发展:https://github.com/AbanobSoliman/DH-PTAM。