Mixup is a data augmentation technique that relies on training using random convex combinations of data points and their labels. In recent years, Mixup has become a standard primitive used in the training of state-of-the-art image classification models due to its demonstrated benefits over empirical risk minimization with regards to generalization and robustness. In this work, we try to explain some of this success from a feature learning perspective. We focus our attention on classification problems in which each class may have multiple associated features (or views) that can be used to predict the class correctly. Our main theoretical results demonstrate that, for a non-trivial class of data distributions with two features per class, training a 2-layer convolutional network using empirical risk minimization can lead to learning only one feature for almost all classes while training with a specific instantiation of Mixup succeeds in learning both features for every class. We also show empirically that these theoretical insights extend to the practical settings of image benchmarks modified to have multiple features.
翻译:混合是一种数据增强技术,其依赖于使用数据点及其标签的随机凸组合进行训练。近年来,混合已成为训练最先进图像分类模型的标准基础方法,因为相较于经验风险最小化,其在泛化性和鲁棒性方面已展现出显著优势。在本研究中,我们试图从特征学习的角度解释其部分成功原因。我们关注于每类别可能具有多个相关特征(或视图)的分类问题,这些特征均可用于正确预测类别。我们的主要理论结果表明,对于每个类别具有两个特征的非平凡数据分布类别,使用经验风险最小化训练双层卷积网络可能导致几乎所有类别仅学习到一个特征,而使用特定实例化的混合进行训练则能成功学习每个类别的两个特征。我们通过实验进一步证明,这些理论见解可扩展到经过修改以包含多特征的图像基准测试等实际场景中。