Recent studies have demonstrated the effectiveness of token-based methods for visual content generation. As a representative work, non-autoregressive Transformers (NATs) are able to synthesize images with decent quality in a small number of steps. However, NATs usually necessitate configuring a complicated generation policy comprising multiple manually-designed scheduling rules. These heuristic-driven rules are prone to sub-optimality and come with the requirements of expert knowledge and labor-intensive efforts. Moreover, their one-size-fits-all nature cannot flexibly adapt to the diverse characteristics of each individual sample. To address these issues, we propose AdaNAT, a learnable approach that automatically configures a suitable policy tailored for every sample to be generated. In specific, we formulate the determination of generation policies as a Markov decision process. Under this framework, a lightweight policy network for generation can be learned via reinforcement learning. Importantly, we demonstrate that simple reward designs such as FID or pre-trained reward models, may not reliably guarantee the desired quality or diversity of generated samples. Therefore, we propose an adversarial reward design to guide the training of policy networks effectively. Comprehensive experiments on four benchmark datasets, i.e., ImageNet-256 & 512, MS-COCO, and CC3M, validate the effectiveness of AdaNAT. Code and pre-trained models will be released at https://github.com/LeapLabTHU/AdaNAT.
翻译:近期研究表明,基于令牌的方法在视觉内容生成方面展现出显著成效。作为代表性工作,非自回归Transformer(NAT)能够在少量步骤内合成质量可观的图像。然而,NAT通常需要配置包含多个人工设计调度规则的复杂生成策略。这些启发式驱动的规则容易陷入次优解,且需要专业知识和大量人工投入。此外,其"一刀切"的特性无法灵活适应每个样本的多样化特征。为解决这些问题,我们提出AdaNAT——一种可学习的自适应方法,能够为每个待生成样本自动配置定制化的生成策略。具体而言,我们将生成策略的制定过程建模为马尔可夫决策过程。在此框架下,可通过强化学习训练轻量级生成策略网络。值得注意的是,我们发现简单的奖励设计(如FID或预训练奖励模型)可能无法可靠保证生成样本的预期质量与多样性。因此,我们提出对抗式奖励设计以有效指导策略网络的训练。在ImageNet-256 & 512、MS-COCO和CC3M四个基准数据集上的综合实验验证了AdaNAT的有效性。代码与预训练模型将在https://github.com/LeapLabTHU/AdaNAT发布。