Text-guided image editing with diffusion models has achieved remarkable quality but suffers from prohibitive latency, hindering real-world applications. We introduce FlashEdit, a novel framework designed to enable high-fidelity, real-time image editing. Its efficiency stems from three key innovations: (1) a One-Step Inversion-and-Editing (OSIE) pipeline that bypasses costly iterative processes; (2) a Background Shield (BG-Shield) technique that guarantees background preservation by selectively modifying features only within the edit region; and (3) a Sparsified Spatial Cross-Attention (SSCA) mechanism that ensures precise, localized edits by suppressing semantic leakage to the background. Extensive experiments demonstrate that FlashEdit maintains superior background consistency and structural integrity, while performing edits in under 0.2 seconds, which is an over 150$\times$ speedup compared to prior multi-step methods. Our code will be made publicly available at https://github.com/JunyiWuCode/FlashEdit.
翻译:基于扩散模型的文本引导图像编辑虽已取得显著质量,但其高昂的延迟阻碍了实际应用。本文提出FlashEdit,一种旨在实现高保真实时图像编辑的新型框架。其高效性源于三项关键创新:(1) 单步反演与编辑(OSIE)流程,规避了昂贵的迭代过程;(2) 背景屏蔽(BG-Shield)技术,通过选择性修改编辑区域内的特征,确保背景完整性;(3) 稀疏化空间交叉注意力(SSCA)机制,通过抑制语义向背景的泄漏,实现精确的局部编辑。大量实验表明,FlashEdit在保持优异背景一致性与结构完整性的同时,可在0.2秒内完成编辑,相比先前的多步方法实现了超过150倍的加速。我们的代码将在https://github.com/JunyiWuCode/FlashEdit公开。