One-step image editing is important for making text-guided editing fast, practical, and easy to deploy, but its underlying mechanism is still not fully understood. We revisit ChordEdit through reproduction, ablation, and simplification. Our analysis shows that a) the chord window $δ$ largely acts as an effective timestep shift from $t$ to $t - δ$; b) chord transport acts on high-noise images and mainly performs low-frequency semantic editing; and c) proximal alignment acts on low-noise images and complements it by adding high-frequency target details. In this view, ChordEdit naturally decomposes editing into a coarse low-frequency transport stage and a fine high-frequency alignment stage. These findings suggest a path toward prompt-conditioned dynamic timestep selection for adaptive image editing. All code and results can be found at \href{https://github.com/Harvard-AI-and-Robotics-Lab/ChordEdit-Reproduction}{link}.
翻译:一步式图像编辑对于实现文本引导编辑的快速、实用和易于部署至关重要,但其底层机制尚未被完全理解。我们通过复现、消融和简化重新审视ChordEdit方法。分析表明:a) 和弦窗口δ实质上起到了从时间步t到t-δ的有效时间位移作用;b) 和弦传输作用于高噪声图像,主要执行低频语义编辑;c) 近端对齐作用于低噪声图像,通过添加高频目标细节来补充前者。基于此视角,ChordEdit自然地将编辑过程分解为低频粗略传输阶段和高频精细对齐阶段。这些发现为基于提示的动态时间步选择实现自适应图像编辑指明了一条路径。所有代码与结果均可通过\href{https://github.com/Harvard-AI-and-Robotics-Lab/ChordEdit-Reproduction}{链接}获取。