Infinity: Scaling Bitwise AutoRegressive Modeling for High-Resolution Image Synthesis

We present Infinity, a Bitwise Visual AutoRegressive Modeling capable of generating high-resolution, photorealistic images following language instruction. Infinity redefines visual autoregressive model under a bitwise token prediction framework with an infinite-vocabulary tokenizer & classifier and bitwise self-correction mechanism, remarkably improving the generation capacity and details. By theoretically scaling the tokenizer vocabulary size to infinity and concurrently scaling the transformer size, our method significantly unleashes powerful scaling capabilities compared to vanilla VAR. Infinity sets a new record for autoregressive text-to-image models, outperforming top-tier diffusion models like SD3-Medium and SDXL. Notably, Infinity surpasses SD3-Medium by improving the GenEval benchmark score from 0.62 to 0.73 and the ImageReward benchmark score from 0.87 to 0.96, achieving a win rate of 66%. Without extra optimization, Infinity generates a high-quality 1024x1024 image in 0.8 seconds, making it 2.6x faster than SD3-Medium and establishing it as the fastest text-to-image model. Models and codes will be released to promote further exploration of Infinity for visual generation and unified tokenizer modeling.

翻译：本文提出Infinity，一种能够根据语言指令生成高分辨率、逼真图像的比特级视觉自回归模型。Infinity通过无限词汇分词器与分类器及比特级自校正机制，在比特级标记预测框架下重构了视觉自回归模型，显著提升了生成能力与细节表现。通过理论上将分词器词汇规模扩展至无限大并同步扩展Transformer规模，本方法相较于传统VAR模型显著释放了强大的扩展潜力。Infinity为自回归文生图模型创造了新纪录，其性能超越SD3-Medium、SDXL等顶尖扩散模型。值得注意的是，Infinity将GenEval基准分数从0.62提升至0.73，ImageReward基准分数从0.87提升至0.96，并以66%的胜率超越SD3-Medium。在无需额外优化的情况下，Infinity仅需0.8秒即可生成1024x1024高质量图像，速度达到SD3-Medium的2.6倍，成为当前最快的文生图模型。我们将公开模型与代码，以促进Infinity在视觉生成与统一分词器建模领域的进一步探索。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】基于元内存传输的跨域少镜头语义分割，Remember the Difference: Cross-Domain Few-Shot Semantic Segmentation via Meta-Memory Transfer

专知会员服务

13+阅读 · 2022年3月12日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【跨语言BERT模型大集合】Transfer learning is increasingly going multilingual with language-specific BERT models

专知会员服务

54+阅读 · 2020年1月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日