Data augmentation is one of the most effective techniques for regularizing deep learning models and improving their recognition performance in a variety of tasks and domains. However, this holds for standard in-domain settings, in which the training and test data follow the same distribution. For the out-of-domain case, where the test data follow a different and unknown distribution, the best recipe for data augmentation is unclear. In this paper, we show that for out-of-domain and domain generalization settings, data augmentation can provide a conspicuous and robust improvement in performance. To do that, we propose a simple training procedure: (i) use uniform sampling on standard data augmentation transformations; (ii) increase the strength transformations to account for the higher data variance expected when working out-of-domain, and (iii) devise a new reward function to reject extreme transformations that can harm the training. With this procedure, our data augmentation scheme achieves a level of accuracy that is comparable to or better than state-of-the-art methods on benchmark domain generalization datasets. Code: \url{https://github.com/Masseeh/DCAug}
翻译:数据增强是深度学习中最为有效的正则化技术之一,能够提升模型在多种任务和领域的识别性能。然而,这一结论仅适用于标准同域场景——即训练数据与测试数据服从相同分布的情况。在测试数据遵循不同且未知分布的跨域场景中,数据增强的最优策略尚不明确。本文表明,在跨域及域泛化场景中,数据增强能够带来显著且稳健的性能提升。为此,我们提出一种简单的训练流程:(i) 对标准数据增强变换采用均匀采样;(ii) 增强变换强度以应对跨域场景中可能出现的更高数据方差;(iii) 设计新的奖励函数以拒绝可能损害训练的极端变换。通过该流程,我们提出的数据增强方案在基准域泛化数据集上达到了与最先进方法相当或更优的精度。代码已开源:\url{https://github.com/Masseeh/DCAug}