We introduce BiGR, a novel conditional image generation model using compact binary latent codes for generative training, focusing on enhancing both generation and representation capabilities. BiGR is the first conditional generative model that unifies generation and discrimination within the same framework. BiGR features a binary tokenizer, a masked modeling mechanism, and a binary transcoder for binary code prediction. Additionally, we introduce a novel entropy-ordered sampling method to enable efficient image generation. Extensive experiments validate BiGR's superior performance in generation quality, as measured by FID-50k, and representation capabilities, as evidenced by linear-probe accuracy. Moreover, BiGR showcases zero-shot generalization across various vision tasks, enabling applications such as image inpainting, outpainting, editing, interpolation, and enrichment, without the need for structural modifications. Our findings suggest that BiGR unifies generative and discriminative tasks effectively, paving the way for further advancements in the field. We further enable BiGR to perform text-to-image generation, showcasing its potential for broader applications.
翻译:本文提出BiGR,一种利用紧凑二元潜在编码进行生成训练的新型条件图像生成模型,旨在同时提升生成与表示能力。BiGR是首个在统一框架内融合生成与判别任务的条件生成模型。该模型包含二元分词器、掩码建模机制以及用于二元码预测的二元转码器。此外,我们提出一种新颖的熵序采样方法以实现高效图像生成。大量实验验证了BiGR在生成质量(通过FID-50k指标衡量)与表示能力(通过线性探测准确率证明)方面的卓越性能。同时,BiGR在多种视觉任务中展现出零样本泛化能力,无需结构调整即可实现图像修复、外绘、编辑、插值与增强等应用。研究表明,BiGR有效统一了生成式与判别式任务,为该领域的进一步发展开辟了道路。我们进一步扩展了BiGR的文本到图像生成功能,展现了其更广泛的应用潜力。