The estimation of implicit cross-frame correspondences and the high computational cost have long been major challenges in video semantic segmentation (VSS) for driving scenes. Prior works utilize keyframes, feature propagation, or cross-frame attention to address these issues. By contrast, we are the first to harness vanishing point (VP) priors for more effective segmentation. Intuitively, objects near VPs (i.e., away from the vehicle) are less discernible. Moreover, they tend to move radially away from the VP over time in the usual case of a forward-facing camera, a straight road, and linear forward motion of the vehicle. Our novel, efficient network for VSS, named VPSeg, incorporates two modules that utilize exactly this pair of static and dynamic VP priors: sparse-to-dense feature mining (DenseVP) and VP-guided motion fusion (MotionVP). MotionVP employs VP-guided motion estimation to establish explicit correspondences across frames and help attend to the most relevant features from neighboring frames, while DenseVP enhances weak dynamic features in distant regions around VPs. These modules operate within a context-detail framework, which separates contextual features from high-resolution local features at different input resolutions to reduce computational costs. Contextual and local features are integrated through contextualized motion attention (CMA) for the final prediction. Extensive experiments on two popular driving segmentation benchmarks, Cityscapes and ACDC, demonstrate that VPSeg outperforms previous SOTA methods, with only modest computational overhead.
翻译:视频语义分割(VSS)在驾驶场景中长期面临两大核心挑战:跨帧隐式对应关系的估计与高昂的计算代价。现有工作通过关键帧、特征传播或跨帧注意力机制来解决这些问题。相比之下,我们首次利用消失点(VP)先验实现更高效的分割。直观而言,靠近消失点(即远离车辆)的物体更难以辨别;且在前向摄像头、直行道路和车辆直线前进的常规情况下,这些物体往往随时间沿消失点径向远离。我们提出的新型高效VSS网络VPSeg包含两个模块,分别利用静态与动态消失点先验:稀疏到稠密特征挖掘模块(DenseVP)与消失点引导运动融合模块(MotionVP)。MotionVP通过消失点引导的运动估计建立帧间显式对应关系,并聚焦邻帧中最相关特征;DenseVP则增强消失点附近远距离区域的弱动态特征。这些模块运行于上下文-细节框架中,通过在不同输入分辨率下分离上下文特征与高分辨率局部特征以降低计算成本。最终通过上下文运动注意力(CMA)融合上下文与局部特征进行预测。在Cityscapes和ACDC两个主流驾驶分割基准上的大量实验表明,VPSeg仅需适度计算开销即可超越现有最先进方法。