We introduce the Sphere Encoder, an efficient generative framework capable of producing images in a single forward pass and competing with many-step diffusion models using fewer than five steps. Our approach works by learning an encoder that maps natural images uniformly onto a spherical latent space, and a decoder that maps random latent vectors back to the image space. Trained solely through image reconstruction losses, the model generates an image by simply decoding a random point on the sphere. Our architecture naturally supports conditional generation, and looping the encoder/decoder a few times can further enhance image quality. Across several datasets, the sphere encoder approach yields performance competitive with state of the art diffusions, but with a small fraction of the inference cost. Project page is available at https://sphere-encoder.github.io .
翻译:本文提出球面编码器(Sphere Encoder),这是一种高效的生成框架,能够通过单次前向传播生成图像,并在少于五步的采样步骤中与多步扩散模型相竞争。我们的方法通过学习一个编码器将自然图像均匀映射到球面隐空间,并训练一个解码器将随机隐向量映射回图像空间。该模型仅通过图像重建损失进行训练,通过直接解码球面上的随机点即可生成图像。我们的架构天然支持条件生成,且对编码器/解码器进行数次循环迭代可进一步提升图像质量。在多个数据集上的实验表明,球面编码器方法取得了与当前最优扩散模型相当的性能,同时仅需极低的推理成本。项目页面详见 https://sphere-encoder.github.io。