Current training pipelines in object recognition neglect Hue Jittering when doing data augmentation as it not only brings appearance changes that are detrimental to classification, but also the implementation is inefficient in practice. In this study, we investigate the effect of hue variance in the context of video understanding and find this variance to be beneficial since static appearances are less important in videos that contain motion information. Based on this observation, we propose a data augmentation method for video understanding, named Motion Coherent Augmentation (MCA), that introduces appearance variation in videos and implicitly encourages the model to prioritize motion patterns, rather than static appearances. Concretely, we propose an operation SwapMix to efficiently modify the appearance of video samples, and introduce Variation Alignment (VA) to resolve the distribution shift caused by SwapMix, enforcing the model to learn appearance invariant representations. Comprehensive empirical evaluation across various architectures and different datasets solidly validates the effectiveness and generalization ability of MCA, and the application of VA in other augmentation methods. Code is available at https://github.com/BeSpontaneous/MCA-pytorch.
翻译:当前目标识别训练流程在进行数据增强时通常忽略色调抖动,因为其不仅引入不利于分类的外观变化,且实际实现效率较低。本研究通过探索色调变化在视频理解场景中的影响,发现该变化具有积极效用:由于包含运动信息的视频中静态外观的重要性相对降低。基于此发现,我们提出面向视频理解的数据增强方法——运动连贯增强(MCA),该方法通过引入视频外观变化,隐式引导模型优先关注运动模式而非静态外观。具体而言,我们提出SwapMix操作高效修改视频样本外观,并引入变分对齐(VA)解决SwapMix引发的分布偏移问题,迫使模型学习外观不变表征。跨多种架构与数据集的全方位实证评估,充分验证了MCA及VA在其他增强方法中的有效性与泛化能力。相关代码已开源至 https://github.com/BeSpontaneous/MCA-pytorch。