Rapid and reliable identification of dynamic scene parts, also known as motion segmentation, is a key challenge for mobile sensors. Contemporary RGB camera-based methods rely on modeling camera and scene properties however, are often under-constrained and fall short in unknown categories. Event cameras have the potential to overcome these limitations, but corresponding methods have only been demonstrated in smaller-scale indoor environments with simplified dynamic objects. This work presents an event-based method for class-agnostic motion segmentation that can successfully be deployed across complex large-scale outdoor environments too. To this end, we introduce a novel divide-and-conquer pipeline that combines: (a) ego-motion compensated events, computed via a scene understanding module that predicts monocular depth and camera pose as auxiliary tasks, and (b) optical flow from a dedicated optical flow module. These intermediate representations are then fed into a segmentation module that predicts motion segmentation masks. A novel transformer-based temporal attention module in the segmentation module builds correlations across adjacent 'frames' to get temporally consistent segmentation masks. Our method sets the new state-of-the-art on the classic EV-IMO benchmark (indoors), where we achieve improvements of 2.19 moving object IoU (2.22 mIoU) and 4.52 point IoU respectively, as well as on a newly-generated motion segmentation and tracking benchmark (outdoors) based on the DSEC event dataset, termed DSEC-MOTS, where we show improvement of 12.91 moving object IoU.
翻译:快速可靠地识别场景中的动态部分(即运动分割)是移动传感器的关键挑战。基于RGB相机的传统方法需要建模相机与场景属性,但常常约束不足,且难以处理未知类别。事件相机具有克服这些局限的潜力,但现有方法仅在简化动态对象的小规模室内环境中得到验证。本文提出一种基于事件的类别无关运动分割方法,可成功应用于复杂大规模室外场景。为此,我们引入一种新型分治流水线,融合以下模块:(a)通过场景理解模块(以单目深度和相机姿态预测为辅助任务)计算的自运动补偿事件;(b)专用光流模块生成的光流。这些中间表征被输入分割模块以预测运动分割掩码。分割模块中基于Transformer的新型时序注意力模块可跨相邻"帧"建立相关性,从而获得时序一致的分割掩码。在经典EV-IMO室内基准上,我们的方法在移动物体IoU(2.19)、mIoU(2.22)和点IoU(4.52)指标上分别实现提升,同时达到新最优性能。基于DSEC事件数据集生成的运动分割与跟踪基准DSEC-MOTS(室外场景)上,我们亦取得12.91移动物体IoU的提升。