Synthetic data offers the promise of cheap and bountiful training data for settings where labeled real-world data is scarce. However, models trained on synthetic data significantly underperform when evaluated on real-world data. In this paper, we propose Proportional Amplitude Spectrum Training Augmentation (PASTA), a simple and effective augmentation strategy to improve out-of-the-box synthetic-to-real (syn-to-real) generalization performance. PASTA perturbs the amplitude spectra of synthetic images in the Fourier domain to generate augmented views. Specifically, with PASTA we propose a structured perturbation strategy where high-frequency components are perturbed relatively more than the low-frequency ones. For the tasks of semantic segmentation (GTAV-to-Real), object detection (Sim10K-to-Real), and object recognition (VisDA-C Syn-to-Real), across a total of 5 syn-to-real shifts, we find that PASTA outperforms more complex state-of-the-art generalization methods while being complementary to the same.
翻译:摘要:在标注真实世界数据稀缺的场景下,合成数据有望提供廉价且丰富的训练数据。然而,基于合成数据训练的模型在真实世界数据上进行评估时,性能显著下降。本文提出比例幅度谱训练增强方法(PASTA),这是一种简单而有效的增强策略,旨在提升模型在开箱即用场景下从合成域到真实域(syn-to-real)的泛化性能。PASTA通过在傅里叶域中扰动合成图像的幅度谱来生成增强视图。具体而言,我们提出一种结构化扰动策略,其中高频成分的扰动幅度相对大于低频成分。在语义分割(GTAV到真实场景)、目标检测(Sim10K到真实场景)和物体识别(VisDA-C合成到真实)任务中,涵盖总共5种合成到真实域迁移场景,我们发现PASTA不仅性能优于更复杂的现有最先进泛化方法,且能与这些方法互补。