Volumetric reconstruction of dynamic scenes is an important problem in computer vision. It is especially challenging in poor lighting and with fast motion. It is partly due to the limitations of RGB cameras: To capture fast motion without much blur, the framerate must be increased, which in turn requires more lighting. In contrast, event cameras, which record changes in pixel brightness asynchronously, are much less dependent on lighting, making them more suitable for recording fast motion. We hence propose the first method to spatiotemporally reconstruct a scene from sparse multi-view event streams and sparse RGB frames. We train a sequence of cross-faded time-conditioned NeRF models, one per short recording segment. The individual segments are supervised with a set of event- and RGB-based losses and sparse-view regularisation. We assemble a real-world multi-view camera rig with six static event cameras around the object and record a benchmark multi-view event stream dataset of challenging motions. Our work outperforms RGB-based baselines, producing state-of-the-art results, and opens up the topic of multi-view event-based reconstruction as a new path for fast scene capture beyond RGB cameras. The code and the data will be released soon at https://4dqv.mpi-inf.mpg.de/DynEventNeRF/
翻译:动态场景的体积重建是计算机视觉中的一个重要问题,在光照条件差且运动速度快的情况下尤其具有挑战性。这部分归因于RGB相机的局限性:为了在运动模糊较少的情况下捕捉快速运动,必须提高帧率,而这反过来又需要更强的光照。相比之下,事件相机异步记录像素亮度的变化,对光照的依赖程度低得多,使其更适用于记录快速运动。因此,我们提出了首个从稀疏多视角事件流与稀疏RGB帧中时空重建场景的方法。我们训练一系列交叉淡化的时间条件NeRF模型,每个模型对应一个短记录片段。各个片段通过一组基于事件和RGB的损失函数以及稀疏视角正则化进行监督。我们构建了一个真实世界的多视角相机装置,在物体周围布置了六个静态事件相机,并记录了一个包含挑战性运动的多视角事件流基准数据集。我们的方法优于基于RGB的基线,取得了最先进的结果,并为基于多视角事件的重建这一新方向开辟了道路,为实现超越RGB相机的快速场景捕捉提供了可能。代码与数据即将发布于 https://4dqv.mpi-inf.mpg.de/DynEventNeRF/