Towards Anytime Optical Flow Estimation with Event Cameras

Optical flow estimation is a fundamental task in the field of autonomous driving. Event cameras are capable of responding to log-brightness changes in microseconds. Its characteristic of producing responses only to the changing region is particularly suitable for optical flow estimation. In contrast to the super low-latency response speed of event cameras, existing datasets collected via event cameras, however, only provide limited frame rate optical flow ground truth, (e.g., at 10Hz), greatly restricting the potential of event-driven optical flow. To address this challenge, we put forward a high-frame-rate, low-latency event representation Unified Voxel Grid, sequentially fed into the network bin by bin. We then propose EVA-Flow, an EVent-based Anytime Flow estimation network to produce high-frame-rate event optical flow with only low-frame-rate optical flow ground truth for supervision. The key component of our EVA-Flow is the stacked Spatiotemporal Motion Refinement (SMR) module, which predicts temporally dense optical flow and enhances the accuracy via spatial-temporal motion refinement. The time-dense feature warping utilized in the SMR module provides implicit supervision for the intermediate optical flow. Additionally, we introduce the Rectified Flow Warp Loss (RFWL) for the unsupervised evaluation of intermediate optical flow in the absence of ground truth. This is, to the best of our knowledge, the first work focusing on anytime optical flow estimation via event cameras. A comprehensive variety of experiments on MVSEC, DESC, and our EVA-FlowSet demonstrates that EVA-Flow achieves competitive performance, super-low-latency (5ms), fastest inference (9.2ms), time-dense motion estimation (200Hz), and strong generalization. Our code will be available at https://github.com/Yaozhuwa/EVA-Flow.

翻译：光流估计是自动驾驶领域的一项基础任务。事件相机能够以微秒级响应对数亮度变化，其仅对变化区域产生响应的特性特别适用于光流估计。然而，与事件相机超低延迟的响应速度相比，现有通过事件相机采集的数据集仅提供有限帧率的光流真值（例如10Hz），这极大地限制了事件驱动光流的潜力。为应对这一挑战，我们提出了一种高帧率、低延迟的事件表示——统一体素网格（Unified Voxel Grid），该表示按序逐bin输入网络。随后我们提出EVA-Flow，一种基于事件的任意时刻光流估计网络，该网络仅使用低帧率光流真值进行监督，即可生成高帧率事件光流。EVA-Flow的核心组件是堆叠式时空运动精化（SMR）模块，该模块通过预测时间密集光流并借助时空运动精化提升精度。SMR模块中采用的时间密集特征扭曲为中间光流提供了隐式监督。此外，我们引入了校正光流扭曲损失（RFWL），用于在无真值情况下对中间光流进行无监督评估。据我们所知，这是首个聚焦于基于事件相机进行任意时刻光流估计的工作。在MVSEC、DESC及我们提出的EVA-FlowSet上开展的全面实验表明，EVA-Flow取得了具有竞争力的性能、超低延迟（5ms）、最快推理速度（9.2ms）、时间密集运动估计（200Hz）及强泛化能力。我们的代码将开源至https://github.com/Yaozhuwa/EVA-Flow。