RefGC-SR$^2$: Reference-guided Generated Content Super-Resolution and Refinement

from arxiv, The first two authors contributed equally to this work. The last two authors are co-corresponding authors. Please visit our project page at https://cmlab-korea.github.io/RefGC-SR2/

Reference-guided generation (e.g., object compositing, customization) has progressed rapidly, yet current pipelines share a fundamental limitation: the object-centric high-resolution reference image (HRRI) provided by users is downsampled to a fixed low-resolution (LR) before being fed into the model, so the fine-grained details are discarded before the output is even produced. In addition, the generation step then introduces its own artifacts (e.g., identity distortion) on top of this loss. Existing reference-guided generated content refinement (RefGCR) methods can correct some of these artifacts but still operate in the LR domain; reference-guided super-resolution (RefSR) methods recover resolution but assume natural-image degradations and ignore the artifact distribution of generative pipelines. To address both gaps in a single formulation, we introduce a new task: reference-guided generated content super-resolution-refinement (RefGC-SR$^2$), where the original HRRI is reused at the post-processing stage to recover lost details, refine generative artifacts, and upscale the output simultaneously. We construct the first real-world triplet data generation pipeline for this RefGC-SR$^2$ task, training a diptych-conditioned generator to synthesize paired low-quality anchors that public pretrained models cannot provide. We further present a frequency-aware diffusion transformer model for RefGC-SR$^2$ that selectively injects fine details from the HRRI while removing generative artifacts. Extensive experiments demonstrate that our RefGC-SR$^2$ model successfully (i) refines the object identity faithfully with respect to the reference, and (ii) recovers high-resolution details, so that the final result is significantly higher quality and practically more usable compared to existing RefGCR and RefSR baselines.

翻译：参考引导生成技术（如目标合成、个性化定制）已取得快速发展，但现有流程存在一个根本性局限：用户提供的高分辨率参考图像(HRRI)被降采样至固定低分辨率(LR)后才输入模型，导致精细细节在输出生成前就已丢失。此外，生成步骤在此损失基础上会引入自身伪影（如身份特征扭曲）。现有参考引导的生成内容精修方法(RefGCR)可纠正部分伪影，但仍局限于低分辨率域；参考引导超分辨率方法(RefSR)虽能恢复分辨率，却假设自然图像退化模式，忽略了生成管线的伪影分布特征。为在统一框架中解决上述两类不足，我们提出新任务：参考引导的生成内容超分辨率-精修(RefGC-SR$^2$)——在后处理阶段复用原始高分辨率参考图像，同步恢复丢失细节、精修生成伪影并提升分辨率。我们为此任务构建了首个真实场景三元组数据生成管线，训练基于双联画条件的生成器合成公开预训练模型无法提供的配对低质量锚点样本。进一步提出频率感知扩散Transformer模型用于RefGC-SR$^2$，该模型能选择性注入来自HRRI的精细细节，同时消除生成伪影。大量实验证明，我们的RefGC-SR$^2$模型成功实现：(i) 基于参考图像忠实精修目标身份特征，(ii) 恢复高分辨率细节，使最终结果在质量与实用性上显著超越现有RefGCR与RefSR基线方法。