Event cameras provide high temporal precision, low data rates, and high dynamic range visual perception, which are well-suited for optical flow estimation. While data-driven optical flow estimation has obtained great success in RGB cameras, its generalization performance is seriously hindered in event cameras mainly due to the limited and biased training data. In this paper, we present a novel simulator, BlinkSim, for the fast generation of large-scale data for event-based optical flow. BlinkSim consists of a configurable rendering engine and a flexible engine for event data simulation. By leveraging the wealth of current 3D assets, the rendering engine enables us to automatically build up thousands of scenes with different objects, textures, and motion patterns and render very high-frequency images for realistic event data simulation. Based on BlinkSim, we construct a large training dataset and evaluation benchmark BlinkFlow that contains sufficient, diversiform, and challenging event data with optical flow ground truth. Experiments show that BlinkFlow improves the generalization performance of state-of-the-art methods by more than 40% on average and up to 90%. Moreover, we further propose an Event optical Flow transFormer (E-FlowFormer) architecture. Powered by our BlinkFlow, E-FlowFormer outperforms the SOTA methods by up to 91% on MVSEC dataset and 14% on DSEC dataset and presents the best generalization performance.
翻译:事件相机具有高时间精度、低数据速率和高动态范围视觉感知能力,特别适用于光流估计。尽管基于数据驱动的光流估计在RGB相机中取得了巨大成功,但在事件相机中,其泛化性能主要受限于有限且存在偏差的训练数据。本文提出了一种新型仿真器BlinkSim,用于快速生成大规模事件相机光流数据。BlinkSim包含一个可配置渲染引擎和一个灵活的事件数据仿真引擎。通过利用当前丰富的3D资源,渲染引擎使我们能够自动构建包含不同物体、纹理和运动模式的数千个场景,并渲染极高频率的图像以进行逼真的事件数据仿真。基于BlinkSim,我们构建了一个大型训练数据集及评估基准BlinkFlow,其中包含充足、多样且具有挑战性的事件数据及其光流真值。实验表明,BlinkFlow将现有最优方法的平均泛化性能提升超过40%,最高达90%。此外,我们进一步提出了一种事件光流Transformer架构(E-FlowFormer)。在BlinkFlow的支持下,E-FlowFormer在MVSEC数据集上相较最优方法性能提升高达91%,在DSEC数据集上提升14%,并展现出最优的泛化性能。