We present DialogPaint, an innovative framework that employs an interactive conversational approach for image editing. The framework comprises a pretrained dialogue model (Blenderbot) and a diffusion model (Stable Diffusion). The dialogue model engages in conversation with users to understand their requirements and generates concise instructions based on the dialogue. Subsequently, the Stable Diffusion model employs these instructions, along with the input image, to produce the desired output. Due to the difficulty of acquiring fine-tuning data for such models, we leverage multiple large-scale models to generate simulated dialogues and corresponding image pairs. After fine-tuning our framework with the synthesized data, we evaluate its performance in real application scenes. The results demonstrate that DialogPaint excels in both objective and subjective evaluation metrics effectively handling ambiguous instructions and performing tasks such as object replacement, style transfer, color modification. Moreover, our framework supports multi-round editing, allowing for the completion of complicated editing tasks.
翻译:我们提出DialogPaint,一种利用交互式对话方式进行图像编辑的创新框架。该框架包含一个预训练对话模型(Blenderbot)和一个扩散模型(Stable Diffusion)。对话模型与用户进行对话以理解其需求,并基于对话生成简洁的指令。随后,Stable Diffusion模型利用这些指令及输入图像生成期望的输出。由于获取此类模型的微调数据较为困难,我们借助多个大规模模型生成模拟对话及对应的图像对。通过合成数据对框架进行微调后,我们在实际应用场景中评估其性能。结果表明,DialogPaint在客观与主观评估指标上均表现优异,能有效处理模糊指令,并完成对象替换、风格迁移、颜色修改等任务。此外,该框架支持多轮编辑,可完成复杂的编辑任务。