Generative AI (GenAI) image editors, such as Nano Banana, produce visually compelling results for retouching tasks, enabling non-experts to edit images through text prompts alone. However, the generative nature of these models often introduces spatial misalignment, texture distortion, and content hallucination, all of which are detrimental to downstream workflows that require pixel-level fidelity. We identify a problem setting we call "structure-preserving GenAI fusion" for black-box GenAI image retouching: retain the perceptual enhancements of a GenAI output while enforcing structural faithfulness to the original input image. To address this problem, we propose a post-processing framework that fuses an input image with its GenAI-enhanced counterpart by first establishing coarse spatial and photometric correspondences, then performing a fusion stage that transfers desired enhancements while suppressing hallucinated content. In the absence of direct prior work in this setting, we evaluate our framework against representative methods from photorealistic style transfer and image fusion. Our experiments demonstrate that our method better preserves aesthetic quality while maintaining pixel-level structural consistency and the input resolution.
翻译:生成式AI图像编辑器(如Nano Banana)通过仅凭文本提示即可使非专家用户完成图像修图,输出具有视觉冲击力的效果。然而,这类模型的生成特性常导致空间错位、纹理畸变和内容幻觉,这些缺陷会破坏需要像素级保真度的下游工作流程。针对黑盒生成式AI图像修图任务,我们定义了一个名为"结构保持型生成式AI融合"的问题设定:在保持对原始输入图像结构忠实性的前提下,保留生成式AI输出结果的感知增强效果。为解决该问题,我们提出一种后处理框架,通过先建立粗略的空间与光度对应关系,再执行融合阶段将期望增强效果迁移至输入图像并抑制幻觉内容,最终实现输入图像与其生成式AI增强版本的融合。鉴于该设定下缺乏直接相关先验工作,我们将框架与光真实感风格迁移和图像融合领域的代表性方法进行对比评估。实验表明,本方法在保持像素级结构一致性与输入分辨率的同时,能够更好地保留美学质量。