Diffusion-based generative models enable powerful image editing capabilities, but achieving precise control while maintaining fidelity and safety remains challenging. We present a comprehensive theoretical and empirical study of controllable diffusion-based image editing, analyzing the trade-offs between adherence to user intent, preservation of non-target content, and output quality. Our work spans text- and mask-guided edits, point/drag manipulation, and inversion-based pipelines. We derive mathematical formulations of editing objectives and analyze dynamics of noise injection, score guidance, and inversion error. We provide theoretical bounds on reconstruction error, stability under repeated edits, and locality of changes. We propose algorithmic frameworks (with pseudocode) for mask-localized and instruction-guided editing, and present extensive experiments comparing state-of-the-art methods (e.g.\ TF-ICON \cite{lu2023tficone}, DragFlow \cite{zhou2025dragflow}, InstructPix2Pix \cite{brooks2023instructpix2pix}, UltraEdit \cite{zhao2024ultraedit}) on multiple tasks and metrics (FID, identity similarity, CLIP alignment, artifact scores, etc). Our results reveal key failure modes, such as identity drift, prompt sensitivity, and compositional errors. We also discuss ethical considerations in image editing, including misuse risks, bias, consent, and concept erasure techniques (e.g.\ MACE \cite{lu2024mace}, ANT \cite{li2025ant}, EraseAnything \cite{gao2024eraseanything}) as safeguards. We conclude with best practices and future directions for responsible, high-fidelity diffusion-based editing.
翻译:基于扩散的生成模型为图像编辑提供了强大能力,但在维持保真度与安全性的同时实现精准控制仍具挑战。我们对可控扩散图像编辑进行了全面的理论与实证研究,深入分析了用户意图遵循度、非目标内容保留度与输出质量之间的权衡关系。研究涵盖文本引导编辑、掩码引导编辑、点/拖拽操作及反演管线。我们推导了编辑目标的数学表达形式,并解析了噪声注入、分数引导及反演误差的动态特性。给出了重构误差的理论界、重复编辑下的稳定性分析以及变更局部性约束。我们提出了面向掩码局部化与指令引导编辑的算法框架(含伪代码),并通过大量实验对比多个任务与指标(FID、身份相似度、CLIP对齐度、伪影分数等)下最新方法(如TF-ICON \cite{lu2023tficone}、DragFlow \cite{zhou2025dragflow}、InstructPix2Pix \cite{brooks2023instructpix2pix}、UltraEdit \cite{zhao2024ultraedit})的表现。结果揭示了关键失效模式,包括身份漂移、提示敏感性和组合错误。我们还探讨了图像编辑中的伦理问题,涵盖滥用风险、偏见、知情同意及作为防护措施的概念擦除技术(如MACE \cite{lu2024mace}、ANT \cite{li2025ant}、EraseAnything \cite{gao2024eraseanything})。最后,我们提出了负责任、高保真扩散编辑的最佳实践与未来方向。