DistractFlow: Improving Optical Flow Estimation via Realistic Distractions and Pseudo-Labeling

We propose a novel data augmentation approach, DistractFlow, for training optical flow estimation models by introducing realistic distractions to the input frames. Based on a mixing ratio, we combine one of the frames in the pair with a distractor image depicting a similar domain, which allows for inducing visual perturbations congruent with natural objects and scenes. We refer to such pairs as distracted pairs. Our intuition is that using semantically meaningful distractors enables the model to learn related variations and attain robustness against challenging deviations, compared to conventional augmentation schemes focusing only on low-level aspects and modifications. More specifically, in addition to the supervised loss computed between the estimated flow for the original pair and its ground-truth flow, we include a second supervised loss defined between the distracted pair's flow and the original pair's ground-truth flow, weighted with the same mixing ratio. Furthermore, when unlabeled data is available, we extend our augmentation approach to self-supervised settings through pseudo-labeling and cross-consistency regularization. Given an original pair and its distracted version, we enforce the estimated flow on the distracted pair to agree with the flow of the original pair. Our approach allows increasing the number of available training pairs significantly without requiring additional annotations. It is agnostic to the model architecture and can be applied to training any optical flow estimation models. Our extensive evaluations on multiple benchmarks, including Sintel, KITTI, and SlowFlow, show that DistractFlow improves existing models consistently, outperforming the latest state of the art.

翻译：我们提出了一种新颖的数据增强方法DistractFlow，通过向输入帧引入真实干扰来训练光流估计模型。基于混合比例，我们将帧对中的某一帧与描绘相似域特征的干扰图像进行融合，从而引入与自然物体和场景一致的视觉扰动，并将此类帧对称为干扰帧对。我们的直觉是，与仅关注底层特征和修改的传统增强方案相比，使用语义有意义的干扰能使模型学习相关变化并增强对异常偏差的鲁棒性。具体而言，除计算原始帧对估计流与其真实标注流之间的有监督损失外，我们还额外引入第二个有监督损失——即干扰帧对估计流与原始帧对真实标注流之间的损失，且两者采用相同的混合比例加权。此外，当存在无标注数据时，我们通过伪标签和交叉一致性正则化将增强方法扩展至自监督场景：强制原始帧对与其干扰版本在干扰帧对上的估计流保持一致。该方法无需额外标注即可显著增加可用训练帧对数量，且与模型架构无关，可应用于训练任意光流估计模型。在Sintel、KITTI及SlowFlow等多个基准上的广泛评估表明，DistractFlow能够持续提升现有模型性能，并超越当前最先进方法。