Data augmentation is an essential building block for learning efficient deep learning models. Among all augmentation techniques proposed so far, linear interpolation of training data points, also called mixup, has found to be effective for a large panel of applications. While the majority of works have focused on selecting the right points to mix, or applying complex non-linear interpolation, we are interested in mixing similar points more frequently and strongly than less similar ones. To this end, we propose to dynamically change the underlying distribution of interpolation coefficients through warping functions, depending on the similarity between data points to combine. We define an efficient and flexible framework to do so without losing in diversity. We provide extensive experiments for classification and regression tasks, showing that our proposed method improves both performance and calibration of models. Code available in https://github.com/ENSTA-U2IS/torch-uncertainty
翻译:数据增强是学习高效深度学习模型的重要基础。在现有各类增强技术中,训练数据点的线性插值(即混合增强)已被证明在大量应用中行之有效。尽管多数研究聚焦于选择正确的数据点进行混合,或采用复杂的非线性插值方法,但本研究更关注对相似度较高的数据点进行更频繁、更强烈的混合。为此,我们提出通过翘曲函数动态调整插值系数的底层分布,该分布取决于待组合数据点间的相似度。我们构建了一个高效且灵活的框架,在保持数据多样性的同时实现这一目标。通过广泛的分类与回归任务实验证明,所提方法能够同时提升模型的性能与校准质量。代码已在 https://github.com/ENSTA-U2IS/torch-uncertainty 开源。