FreeScale: Unleashing the Resolution of Diffusion Models via Tuning-Free Scale Fusion

Visual diffusion models achieve remarkable progress, yet they are typically trained at limited resolutions due to the lack of high-resolution data and constrained computation resources, hampering their ability to generate high-fidelity images or videos at higher resolutions. Recent efforts have explored tuning-free strategies to exhibit the untapped potential higher-resolution visual generation of pre-trained models. However, these methods are still prone to producing low-quality visual content with repetitive patterns. The key obstacle lies in the inevitable increase in high-frequency information when the model generates visual content exceeding its training resolution, leading to undesirable repetitive patterns deriving from the accumulated errors. To tackle this challenge, we propose FreeScale, a tuning-free inference paradigm to enable higher-resolution visual generation via scale fusion. Specifically, FreeScale processes information from different receptive scales and then fuses it by extracting desired frequency components. Extensive experiments validate the superiority of our paradigm in extending the capabilities of higher-resolution visual generation for both image and video models. Notably, compared with the previous best-performing method, FreeScale unlocks the generation of 8k-resolution images for the first time.

翻译：视觉扩散模型取得了显著进展，但由于缺乏高分辨率数据和计算资源受限，它们通常在有限分辨率下进行训练，这阻碍了其在更高分辨率下生成高保真图像或视频的能力。近期研究探索了免调优策略，以展现预训练模型在更高分辨率视觉生成方面尚未开发的潜力。然而，这些方法仍容易产生具有重复模式的低质量视觉内容。关键障碍在于，当模型生成超出其训练分辨率的视觉内容时，高频信息的增加不可避免，导致由累积误差产生的不良重复模式。为应对这一挑战，我们提出了FreeScale，一种免调优推理范式，通过尺度融合实现更高分辨率的视觉生成。具体而言，FreeScale处理来自不同感受野尺度的信息，然后通过提取所需频率分量进行融合。大量实验验证了我们的范式在扩展图像和视频模型更高分辨率生成能力方面的优越性。值得注意的是，与先前性能最佳的方法相比，FreeScale首次实现了8k分辨率图像的生成。

相关内容

Freescale

关注 0

飞思卡尔半导体(Freescale Semiconductor)是全球领先的半导体公司，全球总部位于美国德州的奥斯汀市。专注于嵌入式处理解决方案。飞思卡尔面向汽车、网络、工业和消费电子市场，提供的技术包括微处理器、微控制器、传感器、模拟集成电路和连接。飞思卡尔的一些主要应用和终端市场包括汽车安全、混合动力和全电动汽车、下一代无线基础设施、智能能源管理、便携式医疗器件、消费电器以及智能移动器件等。在全世界拥有多家设计、研发、制造和销售机构。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日