We present a training-free framework for continuous and controllable image editing at test time for text-conditioned generative models. In contrast to prior approaches that rely on additional training or manual user intervention, we find that a simple steering in the text-embedding space is sufficient to produce smooth edit control. Given a target concept (e.g., enhancing photorealism or changing facial expression), we use a large language model to automatically construct a small set of debiased contrastive prompt pairs, from which we compute a steering vector in the generator's text-encoder space. We then add this vector directly to the input prompt representation to control generation along the desired semantic axis. To obtain a continuous control, we propose an elastic range search procedure that automatically identifies an effective interval of steering magnitudes, avoiding both under-steering (no-edit) and over-steering (changing other attributes). Adding the scaled versions of the same vector within this interval yields smooth and continuous edits. Since our method modifies only textual representations, it naturally generalizes across text-conditioned modalities, including image and video generation. To quantify the steering continuity, we introduce a new evaluation metric that measures the uniformity of semantic change across edit strengths. We compare the continuous editing behavior across methods and find that, despite its simplicity and lightweight design, our approach is comparable to training-based alternatives, outperforming other training-free methods.
翻译:我们提出了一种免训练框架,用于在测试阶段对文本条件生成模型进行连续且可控的图像编辑。与依赖额外训练或手动用户干预的先前方法不同,我们发现仅需在文本嵌入空间中进行简单引导即可实现平滑的编辑控制。针对目标概念(如增强照片真实感或改变面部表情),我们利用大型语言模型自动构建一组小型去偏对比提示对,并从中计算生成器文本编码器空间中的引导向量。该向量随后直接添加到输入提示表示中,以沿期望语义轴控制生成。为实现连续控制,我们提出了一种弹性范围搜索程序,自动识别有效的引导幅度区间,避免出现引导不足(无编辑)或过度引导(改变其他属性)的情况。在此区间内对同一向量进行缩放后添加,即可获得平滑且连续的编辑效果。由于该方法仅修改文本表示,因此可自然推广至图像和视频生成等文本条件模态。为量化引导连续性,我们引入了新的评价指标,用于衡量不同编辑强度下语义变化的均匀性。通过比较不同方法的连续编辑行为,我们发现尽管本方法简单且轻量,但其效果可与基于训练的方法相媲美,且优于其他免训练方法。