This paper aims at analyzing the regularization effect that data augmentation induces on supervised regression methods in the proportional regime, where the number of covariates grows proportionally to the number of samples. We provide a tight characterization of the test error, measured in mean squared error, in terms only of the population quantities of the true data, as well as first and second order statistics of the augmentation scheme. Our results are valid under misspecified feature maps, and for any network architecture where only the last readout layer is trained, and the rest of the network is either frozen or randomly initialized. We specify our results in the case of Gaussian data, and show that our asymptotic characterization is tight in this setting.
翻译:本文旨在分析在比例机制下(即协变量数量与样本数量成比例增长时),数据增强对监督回归方法产生的正则化效应。我们仅利用真实数据的总体量以及数据增强方案的一阶和二阶统计量,对以均方误差衡量的测试误差给出了精确刻画。我们的结论在特征映射存在设定偏误的情况下依然成立,并适用于任何仅训练最后一层读出层、其余网络层保持冻结或随机初始化的网络架构。我们针对高斯数据情形给出了具体结论,并证明在该设定下我们的渐近刻画具有紧性。