Event cameras do not produce images, but rather a continuous flow of events, which encode changes of illumination for each pixel independently and asynchronously. While they output temporally rich information, they lack any depth information which could facilitate their use with other sensors. LiDARs can provide this depth information, but are by nature very sparse, which makes the depth-to-event association more complex. Furthermore, as events represent changes of illumination, they might also represent changes of depth; associating them with a single depth is therefore inadequate. In this work, we propose to address these issues by fusing information from an event camera and a LiDAR using a learning-based approach to estimate accurate dense depth maps. To solve the "potential change of depth" problem, we propose here to estimate two depth maps at each step: one "before" the events happen, and one "after" the events happen. We further propose to use this pair of depths to compute a depth difference for each event, to give them more context. We train and evaluate our network, ALED, on both synthetic and real driving sequences, and show that it is able to predict dense depths with an error reduction of up to 61% compared to the current state of the art. We also demonstrate the quality of our 2-depths-to-event association, and the usefulness of the depth difference information. Finally, we release SLED, a novel synthetic dataset comprising events, LiDAR point clouds, RGB images, and dense depth maps.
翻译:事件相机不产生图像,而是连续的事件流,每个像素独立且异步地编码光照变化。虽然它们输出时间上丰富的信息,但缺乏任何深度信息,这限制了其与其他传感器的协同使用。LiDAR可以提供深度信息,但其天然稀疏的特性使得深度与事件的关联更为复杂。此外,由于事件代表光照变化,它们也可能代表深度变化,因此将其与单一深度关联是不充分的。在这项工作中,我们提出通过融合事件相机和LiDAR的信息,采用基于学习的方法来估计精确的稠密深度图,以解决这些问题。为了解决“深度潜在变化”问题,我们提出在每个时间步估计两个深度图:一个在事件发生“之前”,另一个在事件发生“之后”。我们进一步提出使用这对深度来计算每个事件的深度差,为其提供更多上下文信息。我们在合成和真实驾驶序列上训练并评估我们的网络ALED,结果表明与当前最先进方法相比,它能够预测稠密深度图,误差减少高达61%。我们还展示了我们提出的2-深度-事件关联的质量,以及深度差信息的有用性。最后,我们发布了SLED,这是一个包含事件、LiDAR点云、RGB图像和稠密深度图的新型合成数据集。