Any-scale image synthesis offers an efficient and scalable solution to synthesize photo-realistic images at any scale, even going beyond 2K resolution. However, existing GAN-based solutions depend excessively on convolutions and a hierarchical architecture, which introduce inconsistency and the $``$texture sticking$"$ issue when scaling the output resolution. From another perspective, INR-based generators are scale-equivariant by design, but their huge memory footprint and slow inference hinder these networks from being adopted in large-scale or real-time systems. In this work, we propose $\textbf{C}$olumn-$\textbf{R}$ow $\textbf{E}$ntangled $\textbf{P}$ixel $\textbf{S}$ynthesis ($\textbf{CREPS}$), a new generative model that is both efficient and scale-equivariant without using any spatial convolutions or coarse-to-fine design. To save memory footprint and make the system scalable, we employ a novel bi-line representation that decomposes layer-wise feature maps into separate $``$thick$"$ column and row encodings. Experiments on various datasets, including FFHQ, LSUN-Church, MetFaces, and Flickr-Scenery, confirm CREPS' ability to synthesize scale-consistent and alias-free images at any arbitrary resolution with proper training and inference speed. Code is available at https://github.com/VinAIResearch/CREPS.
翻译:任意尺度图像合成提供了一种高效且可扩展的解决方案,能够合成任意分辨率(甚至超过2K分辨率)下的逼真图像。然而,现有基于GAN的方法过度依赖卷积和层级架构,这会导致输出分辨率缩放时出现不一致性和“纹理粘附”问题。从另一角度来看,基于INR的生成器通过设计实现了尺度等变性,但其巨大的内存开销和缓慢的推理速度阻碍了这些网络在大规模或实时系统中的应用。本文提出了一种新的生成模型——**行列纠缠像素合成**(CREPS),该模型无需使用任何空间卷积或由粗到细的设计,同时兼具高效性和尺度等变性。为节省内存开销并提升系统可扩展性,我们采用了一种新颖的双线性表示方法,将分层特征图分解为独立的“厚”行编码和列编码。在包括FFHQ、LSUN-Church、MetFaces和Flickr-Scenery在内的多个数据集上的实验证实,CREPS能够以合理的训练和推理速度,合成任意分辨率下尺度一致且无混叠的图像。代码已开源:https://github.com/VinAIResearch/CREPS。