Conditional image generative models hold considerable promise to produce infinite amounts of synthetic training data. Yet, recent progress in generation quality has come at the expense of generation diversity, limiting the utility of these models as a source of synthetic training data. Although guidance-based approaches have been introduced to improve the utility of generated data by focusing on quality or diversity, the (implicit or explicit) utility functions oftentimes disregard the potential distribution shift between synthetic and real data. In this work, we introduce Chamfer Guidance: a training-free guidance approach which leverages a handful of real exemplar images to characterize the quality and diversity of synthetic data. We show that by leveraging the proposed Chamfer Guidance, we can boost the diversity of the generations w.r.t. a dataset of real images while maintaining or improving the generation quality on ImageNet-1k and standard geo-diversity benchmarks. Our approach achieves state-of-the-art few-shot performance with as little as 2 exemplar real images, obtaining 96.4% in terms of precision, and 86.4% in terms of distributional coverage, which increase to 97.5% and 92.7%, respectively, when using 32 real images. We showcase the benefits of the Chamfer Guidance generation by training downstream image classifiers on synthetic data, achieving accuracy boost of up to 15% for in-distribution over the baselines, and up to 16% in out-of-distribution. Furthermore, our approach does not require using the unconditional model, and thus obtains a 31% reduction in FLOPs w.r.t. classifier-free-guidance-based approaches at sampling time.
翻译:条件图像生成模型在产生无限量合成训练数据方面具有巨大潜力。然而,近期生成质量的进步往往以牺牲生成多样性为代价,限制了这些模型作为合成训练数据源的实用性。尽管已有基于引导的方法通过关注质量或多样性来提升生成数据的实用性,但这些(隐式或显式的)效用函数常常忽视合成数据与真实数据之间潜在的数据分布偏移。本文提出Chamfer Guidance:一种无需训练的引导方法,利用少量真实示例图像来表征合成数据的质量与多样性。我们证明,通过采用所提出的Chamfer Guidance,能够在保持或提升ImageNet-1k及标准地理多样性基准上生成质量的同时,显著提高生成结果相对于真实图像数据集的多样性。我们的方法仅需2张真实示例图像即可实现最先进的少样本性能,在精度指标上达到96.4%,在分布覆盖度上达到86.4%;当使用32张真实图像时,这两项指标分别提升至97.5%和92.7%。我们通过在合成数据上训练下游图像分类器展示了Chamfer Guidance生成的优势:在分布内数据上相比基线模型获得最高15%的准确率提升,在分布外数据上获得最高16%的提升。此外,我们的方法无需使用无条件模型,因此在采样阶段相比基于无分类器引导的方法可减少31%的FLOPs计算量。