FilterPrompt: Guiding Image Transfer in Diffusion Models

In controllable generation tasks, flexibly manipulating the generated images to attain a desired appearance or structure based on a single input image cue remains a critical and longstanding challenge. Achieving this requires the effective decoupling of key attributes within the input image data, aiming to get representations accurately. Previous research has predominantly concentrated on disentangling image attributes within feature space. However, the complex distribution present in real-world data often makes the application of such decoupling algorithms to other datasets challenging. Moreover, the granularity of control over feature encoding frequently fails to meet specific task requirements. Upon scrutinizing the characteristics of various generative models, we have observed that the input sensitivity and dynamic evolution properties of the diffusion model can be effectively fused with the explicit decomposition operation in pixel space. This integration enables the image processing operations performed in pixel space for a specific feature distribution of the input image, and can achieve the desired control effect in the generated results. Therefore, we propose FilterPrompt, an approach to enhance the model control effect. It can be universally applied to any diffusion model, allowing users to adjust the representation of specific image features in accordance with task requirements, thereby facilitating more precise and controllable generation outcomes. In particular, our designed experiments demonstrate that the FilterPrompt optimizes feature correlation, mitigates content conflicts during the generation process, and enhances the model's control capability.

翻译：在可控生成任务中，如何基于单张输入图像线索灵活调控生成图像以获取期望的外观或结构，始终是一个关键且长期存在的挑战。实现这一目标需要有效解耦输入图像数据中的关键属性，从而准确获取表征。以往研究主要集中于在特征空间中实现图像属性的解耦。然而，真实数据中存在的复杂分布往往使得此类解耦算法难以迁移至其他数据集。此外，特征编码的粒度控制常常无法满足特定任务需求。通过深入分析各类生成模型的特性，我们发现扩散模型的输入敏感性与动态演化特性，能够与像素空间中的显式分解操作实现有效融合。这种融合使得在像素空间中对输入图像特定特征分布执行的图像处理操作，能够在生成结果中达成预期的控制效果。因此，我们提出FilterPrompt方法以增强模型控制能力。该方法可通用地应用于任意扩散模型，允许用户根据任务需求调整特定图像特征的表示，从而获得更精确、可控的生成结果。特别地，我们设计的实验表明，FilterPrompt能够优化特征相关性、缓解生成过程中的内容冲突，并提升模型的控制能力。