Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels. In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due to its demonstrated benefits over empirical risk minimization with regards to generalization and robustness. In this work, we try to explain some of this success from a feature learning perspective. We focus our attention on classification problems in which each class may have multiple associated features (or views) that can be used to predict the class correctly. Our main theoretical results demonstrate that, for a non-trivial class of data distributions with two features per class, training a 2-layer convolutional network using empirical risk minimization can lead to learning only one feature for almost all classes while training with a specific instantiation of Mixup succeeds in learning both features for every class. We also show empirically that these theoretical insights extend to the practical settings of image benchmarks modified to have multiple features.
翻译:混合是一种数据增强技术,其核心在于使用数据点及其标签的随机凸组合进行训练。近年来,混合已成为训练最先进图像分类模型的标准基础方法,因为与经验风险最小化相比,它在泛化性和鲁棒性方面展现出显著优势。本研究尝试从特征学习视角解释部分成功机理。我们聚焦于每个类别可能具有多个相关特征(或视图)的分类问题,这些特征均可用于正确预测类别。主要理论结果表明,对于每类含双特征的非平凡数据分布而言,使用经验风险最小化训练双层卷积网络会导致几乎全部类别仅学习单一特征,而采用混合的特定实例化训练则能使每个类别成功习得两个特征。实验进一步证明,这些理论见解可推广至经改造具有多特征的图像基准数据集的实际场景中。