This paper investigates methods for improving generative data augmentation for deep learning. Generative data augmentation leverages the synthetic samples produced by generative models as an additional dataset for classification with small dataset settings. A key challenge of generative data augmentation is that the synthetic data contain uninformative samples that degrade accuracy. This is because the synthetic samples do not perfectly represent class categories in real data and uniform sampling does not necessarily provide useful samples for tasks. In this paper, we present a novel strategy for generative data augmentation called meta generative regularization (MGR). To avoid the degradation of generative data augmentation, MGR utilizes synthetic samples in the regularization term for feature extractors instead of in the loss function, e.g., cross-entropy. These synthetic samples are dynamically determined to minimize the validation losses through meta-learning. We observed that MGR can avoid the performance degradation of na\"ive generative data augmentation and boost the baselines. Experiments on six datasets showed that MGR is effective particularly when datasets are smaller and stably outperforms baselines.
翻译:本文研究了改善深度学习生成数据增强的方法。生成数据增强利用生成模型产生的合成样本,作为小样本设定下分类任务的额外数据集。生成数据增强的一个关键挑战在于,合成数据包含无信息样本,这会降低准确率。这是因为合成样本不能完美地代表真实数据中的类别分布,且均匀采样不一定能为任务提供有用的样本。本文提出了一种名为元生成正则化(MGR)的新型生成数据增强策略。为避免生成数据增强的性能下降,MGR将合成样本用于特征提取器的正则化项中,而非损失函数(如交叉熵)中。这些合成样本通过元学习动态确定,以最小化验证损失。我们观察到,MGR可以避免朴素生成数据增强的性能退化,并提升基线水平。在六个数据集上的实验表明,MGR在数据集较小时尤为有效,且稳定优于基线方法。