Neon: Negative Extrapolation From Self-Training Improves Image Generation

Scaling generative AI models is bottlenecked by the scarcity of high-quality training data. The ease of synthesizing from a generative model suggests using (unverified) synthetic data to augment a limited corpus of real data for the purpose of fine-tuning in the hope of improving performance. Unfortunately, however, the resulting positive feedback loop leads to model autophagy disorder (MAD, aka model collapse) that results in a rapid degradation in sample quality and/or diversity. In this paper, we introduce Neon (for Negative Extrapolation frOm self-traiNing), a new learning method that turns the degradation from self-training into a powerful signal for self-improvement. Given a base model, Neon first fine-tunes it on its own self-synthesized data but then, counterintuitively, reverses its gradient updates to extrapolate away from the degraded weights. We prove that Neon works because typical inference samplers that favor high-probability regions create a predictable anti-alignment between the synthetic and real data population gradients, which negative extrapolation corrects to better align the model with the true data distribution. Neon is remarkably easy to implement via a simple post-hoc merge that requires no new real data, works effectively with as few as 1k synthetic samples, and typically uses less than 1% additional training compute. We demonstrate Neon's universality across a range of architectures (diffusion, flow matching, autoregressive, and inductive moment matching models) and datasets (ImageNet, CIFAR-10, and FFHQ). In particular, on ImageNet 256x256, Neon elevates the xAR-L model to a new state-of-the-art FID of 1.02 with only 0.36% additional training compute. Code is available at https://github.com/VITA-Group/Neon

翻译：生成式人工智能模型的扩展受限于高质量训练数据的稀缺性。从生成模型合成数据的便捷性表明，可以利用（未经验证的）合成数据来扩增有限的真实数据语料库，以进行微调，期望提升性能。然而，不幸的是，由此产生的正反馈循环会导致模型自噬障碍（MAD，亦称模型崩溃），从而造成样本质量和/或多样性的迅速退化。本文提出Neon（源于自训练的负向外推），这是一种新的学习方法，它将自训练导致的退化转变为了自我改进的强大信号。给定一个基础模型，Neon首先在其自身合成的数据上进行微调，但随后反直觉地反转其梯度更新，以从退化的权重处向外推。我们证明Neon之所以有效，是因为典型的倾向于高概率区域的推理采样器会在合成数据与真实数据的总体梯度之间产生一种可预测的反向对齐，而负向外推修正了这种反向对齐，使模型更好地与真实数据分布对齐。Neon的实现异常简单，只需通过一个简单的后处理合并即可完成，无需新的真实数据，仅需少至1k个合成样本即可有效工作，并且通常使用不到1%的额外训练计算量。我们展示了Neon在一系列架构（扩散模型、流匹配模型、自回归模型和归纳矩匹配模型）和数据集（ImageNet、CIFAR-10和FFHQ）上的普适性。特别是在ImageNet 256x256上，Neon将xAR-L模型提升至新的最先进水平，FID达到1.02，而仅使用了0.36%的额外训练计算量。代码可在 https://github.com/VITA-Group/Neon 获取。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/