We propose a method for dense depth estimation from an event stream generated when sweeping the focal plane of the driving lens attached to an event camera. In this method, a depth map is inferred from an ``event focal stack'' composed of the event stream using a convolutional neural network trained with synthesized event focal stacks. The synthesized event stream is created from a focal stack generated by Blender for any arbitrary 3D scene. This allows for training on scenes with diverse structures. Additionally, we explored methods to eliminate the domain gap between real event streams and synthetic event streams. Our method demonstrates superior performance over a depth-from-defocus method in the image domain on synthetic and real datasets.
翻译:我们提出了一种从事件流中估计稠密深度的方法,该事件流由附着在事件相机上的驱动镜头扫描焦平面时产生。在此方法中,深度图通过一个卷积神经网络从由事件流构成的“事件焦堆栈”中推断得出,该网络使用合成的事件焦堆栈进行训练。合成的事件流由Blender为任意三维场景生成的焦堆栈创建而成,这使得我们能够在具有多样化结构的场景上进行训练。此外,我们探索了消除真实事件流与合成事件流之间领域差异的方法。我们的方法在合成和真实数据集上均展现出优于图像域离焦深度估计方法的性能。