Current training pipelines in object recognition neglect Hue Jittering when doing data augmentation as it not only brings appearance changes that are detrimental to classification, but also the implementation is inefficient in practice. In this study, we investigate the effect of hue variance in the context of video recognition and find this variance to be beneficial since static appearances are less important in videos that contain motion information. Based on this observation, we propose a data augmentation method for video recognition, named Motion Coherent Augmentation (MCA), that introduces appearance variation in videos and implicitly encourages the model to prioritize motion patterns, rather than static appearances. Concretely, we propose an operation SwapMix to efficiently modify the appearance of video samples, and introduce Variation Alignment (VA) to resolve the distribution shift caused by SwapMix, enforcing the model to learn appearance invariant representations. Comprehensive empirical evaluation across various architectures and different datasets solidly validates the effectiveness and generalization ability of MCA, and the application of VA in other augmentation methods. Code is available at https://github.com/BeSpontaneous/MCA-pytorch.
翻译:当前目标识别的训练流程在进行数据增强时普遍避免色调抖动,不仅因其引入的外观变化不利于分类任务,更因实际执行效率低下。本研究探究了色调变化在视频识别场景中的作用,发现由于包含运动信息的视频中静态外观的重要性降低,此类变化反而具有积极意义。基于这一发现,我们提出面向视频识别的数据增强方法——运动一致增强(MCA),通过引入视频样本的外观变化,隐式引导模型优先关注运动模式而非静态外观。具体而言,我们设计了SwapMix操作以高效修改视频样本外观,并引入变化对齐(VA)机制解决SwapMix带来的分布偏移问题,迫使模型学习外观不变表征。跨多种架构与不同数据集的综合实验充分验证了MCA的有效性与泛化能力,以及VA在其他增强方法中的适配性。代码已开源:https://github.com/BeSpontaneous/MCA-pytorch