Event cameras are bio-inspired, motion-activated sensors that demonstrate substantial potential in handling challenging situations, such as motion blur and high-dynamic range. In this paper, we proposed EVI-SAM to tackle the problem of 6 DoF pose tracking and 3D reconstruction using monocular event camera. A novel event-based hybrid tracking framework is designed to estimate the pose, leveraging the robustness of feature matching and the precision of direct alignment. Specifically, we develop an event-based 2D-2D alignment to construct the photometric constraint, and tightly integrate it with the event-based reprojection constraint. The mapping module recovers the dense and colorful depth of the scene through the image-guided event-based mapping method. Subsequently, the appearance, texture, and surface mesh of the 3D scene can be reconstructed by fusing the dense depth map from multiple viewpoints using truncated signed distance function (TSDF) fusion. To the best of our knowledge, this is the first non-learning work to realize event-based dense mapping. Numerical evaluations are performed on both publicly available and self-collected datasets, which qualitatively and quantitatively demonstrate the superior performance of our method. Our EVI-SAM effectively balances accuracy and robustness while maintaining computational efficiency, showcasing superior pose tracking and dense mapping performance in challenging scenarios. Video Demo: https://youtu.be/Nn40U4e5Si8.
翻译:事件相机是一种仿生、运动激活的传感器,在处理运动模糊和高动态范围等挑战性场景方面展现出显著潜力。本文提出EVI-SAM以解决单目事件相机的六自由度位姿跟踪与三维重建问题。我们设计了一种新颖的基于事件的混合跟踪框架,通过结合特征匹配的鲁棒性与直接对齐的精度来估计位姿。具体而言,我们开发了基于事件的二维-二维对齐方法以构建光度约束,并将其与基于事件的重投影约束紧耦合集成。建图模块通过图像引导的事件建图方法恢复场景的稠密彩色深度信息。随后,利用截断符号距离函数(TSDF)融合技术从多个视角融合稠密深度图,可重建三维场景的外观、纹理与表面网格。据我们所知,这是首个实现基于事件稠密建图的非学习方法。在公开数据集与自采集数据集上的数值评估,定性与定量地证明了本方法的优越性能。我们的EVI-SAM在保持计算效率的同时有效平衡了精度与鲁棒性,在挑战性场景中展现出优异的位姿跟踪与稠密建图性能。视频演示:https://youtu.be/Nn40U4e5Si8。