Deep learning models frequently suffer from various problems such as class imbalance and lack of robustness to distribution shift. It is often difficult to find data suitable for training beyond the available benchmarks. This is especially the case for computer vision models. However, with the advent of Generative Adversarial Networks (GANs), it is now possible to generate high-quality synthetic data. This synthetic data can be used to alleviate some of the challenges faced by deep learning models. In this work we present a detailed analysis of the effect of training computer vision models using different proportions of synthetic data along with real (organic) data. We analyze the effect that various quantities of synthetic data, when mixed with original data, can have on a model's robustness to out-of-distribution data and the general quality of predictions.
翻译:深度学习模型常面临类别不平衡及对分布迁移缺乏鲁棒性等问题。除现有基准数据集外,通常难以找到适合训练的数据,这对计算机视觉模型尤为突出。然而,随着生成对抗网络(GANs)的出现,现在可以生成高质量合成数据,这类数据可用于缓解深度学习模型面临的某些挑战。本文详细分析了在训练计算机视觉模型时,采用不同比例的合成数据与真实(有机)数据混合的效果。我们探讨了将不同数量的合成数据与原始数据混合后,对模型在分布外数据上的鲁棒性及预测整体质量所产生的影响。