Event cameras are bio-inspired, motion-activated sensors that demonstrate substantial potential in handling challenging situations, such as motion blur and high-dynamic range. In this paper, we proposed EVI-SAM to tackle the problem of 6 DoF pose tracking and 3D reconstruction using monocular event camera. A novel event-based hybrid tracking framework is designed to estimate the pose, leveraging the robustness of feature matching and the precision of direct alignment. Specifically, we develop an event-based 2D-2D alignment to construct the photometric constraint, and tightly integrate it with the event-based reprojection constraint. The mapping module recovers the dense and colorful depth of the scene through the image-guided event-based mapping method. Subsequently, the appearance, texture, and surface mesh of the 3D scene can be reconstructed by fusing the dense depth map from multiple viewpoints using truncated signed distance function (TSDF) fusion. To the best of our knowledge, this is the first non-learning work to realize event-based dense mapping. Numerical evaluations are performed on both publicly available and self-collected datasets, which qualitatively and quantitatively demonstrate the superior performance of our method. Our EVI-SAM effectively balances accuracy and robustness while maintaining computational efficiency, showcasing superior pose tracking and dense mapping performance in challenging scenarios. Video Demo: https://youtu.be/Nn40U4e5Si8.
翻译:事件相机是一种受生物启发的运动触发传感器,在处理运动模糊和高动态范围等挑战性场景方面展现出巨大潜力。本文提出EVI-SAM来解决使用单目事件相机进行六自由度位姿跟踪和三维重建的问题。我们设计了一种新颖的基于事件的混合跟踪框架来估计位姿,该框架融合了特征匹配的鲁棒性与直接对齐的精度。具体而言,我们开发了一种基于事件的二维-二维对齐方法以构建光度约束,并将其与基于事件的重投影约束进行紧耦合集成。建图模块通过图像引导的基于事件建图方法恢复场景的稠密彩色深度信息。随后,通过使用截断符号距离函数融合来自多视角的稠密深度图,可以重建三维场景的外观、纹理和表面网格。据我们所知,这是首个实现基于事件的稠密建图的非学习方法。我们在公开数据集和自采集数据集上进行了数值评估,定性和定量结果均证明了本方法的优越性能。我们的EVI-SAM在保持计算效率的同时,有效平衡了精度与鲁棒性,在挑战性场景中展现出卓越的位姿跟踪与稠密建图性能。视频演示:https://youtu.be/Nn40U4e5Si8。