Mitigating Content Shift and Hallucination in GenAI Image Editing via Structural Refinement

Generative AI (GenAI) image editors, such as Nano Banana, produce visually compelling results for retouching tasks, enabling non-experts to edit images through text prompts alone. However, the generative nature of these models often introduces spatial misalignment, texture distortion, and content hallucination, all of which are detrimental to downstream workflows that require pixel-level fidelity. We identify a problem setting we call "structure-preserving GenAI fusion" for black-box GenAI image retouching: retain the perceptual enhancements of a GenAI output while enforcing structural faithfulness to the original input image. To address this problem, we propose a post-processing framework that fuses an input image with its GenAI-enhanced counterpart by first establishing coarse spatial and photometric correspondences, then performing a fusion stage that transfers desired enhancements while suppressing hallucinated content. In the absence of direct prior work in this setting, we evaluate our framework against representative methods from photorealistic style transfer and image fusion. Our experiments demonstrate that our method better preserves aesthetic quality while maintaining pixel-level structural consistency and the input resolution.

翻译：生成式AI图像编辑器（如Nano Banana）通过仅凭文本提示即可使非专家用户完成图像修图，输出具有视觉冲击力的效果。然而，这类模型的生成特性常导致空间错位、纹理畸变和内容幻觉，这些缺陷会破坏需要像素级保真度的下游工作流程。针对黑盒生成式AI图像修图任务，我们定义了一个名为"结构保持型生成式AI融合"的问题设定：在保持对原始输入图像结构忠实性的前提下，保留生成式AI输出结果的感知增强效果。为解决该问题，我们提出一种后处理框架，通过先建立粗略的空间与光度对应关系，再执行融合阶段将期望增强效果迁移至输入图像并抑制幻觉内容，最终实现输入图像与其生成式AI增强版本的融合。鉴于该设定下缺乏直接相关先验工作，我们将框架与光真实感风格迁移和图像融合领域的代表性方法进行对比评估。实验表明，本方法在保持像素级结构一致性与输入分辨率的同时，能够更好地保留美学质量。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

面向 AI 生成图像的安全与鲁棒水印：全面综述

专知会员服务

14+阅读 · 2025年10月6日

【ICCV2025】AIGI-Holmes：面向可解释性与可泛化性的AI生成图像检测方法 —— 基于多模态大语言模型的研究

专知会员服务

10+阅读 · 2025年7月4日

中文版 | 生成式人工智能（GenAI）：概览、议题与美国国会考量

专知会员服务

24+阅读 · 2025年4月15日

复旦最新《基于文本到图像扩散模型的多模态引导图像编辑》综述

专知会员服务

16+阅读 · 2024年6月21日