Despite the significant progress in diffusion prior-based image restoration, most existing methods apply uniform processing to the entire image, lacking the capability to perform region-customized image restoration according to user instructions. In this work, we propose a new framework, namely InstructRestore, to perform region-adjustable image restoration following human instructions. To achieve this, we first develop a data generation engine to produce training triplets, each consisting of a high-quality image, the target region description, and the corresponding region mask. With this engine and careful data screening, we construct a comprehensive dataset comprising 536,945 triplets to support the training and evaluation of this task. We then examine how to integrate the low-quality image features under the ControlNet architecture to adjust the degree of image details enhancement. Consequently, we develop a ControlNet-like model to identify the target region and allocate different integration scales to the target and surrounding regions, enabling region-customized image restoration that aligns with user instructions. Experimental results demonstrate that our proposed InstructRestore approach enables effective human-instructed image restoration, such as images with bokeh effects and user-instructed local enhancement. Our work advances the investigation of interactive image restoration and enhancement techniques. Data, code, and models will be found at https://github.com/shuaizhengliu/InstructRestore.git.
翻译:尽管基于扩散先验的图像复原技术已取得显著进展,但现有方法大多对整幅图像进行统一处理,缺乏根据用户指令执行区域定制化图像复原的能力。本研究提出一个名为InstructRestore的新框架,旨在遵循人类指令实现区域可调节的图像复原。为实现这一目标,我们首先开发了数据生成引擎来产生训练三元组,每组包含高质量图像、目标区域描述及对应的区域掩码。借助该引擎并经过严格的数据筛选,我们构建了一个包含536,945组三元组的综合性数据集,以支持该任务的训练与评估。随后,我们研究了如何在ControlNet架构下整合低质量图像特征以调节图像细节增强的程度。基于此,我们开发了类ControlNet模型来识别目标区域,并为目标区域与周边区域分配不同的融合尺度,从而实现符合用户指令的区域定制化图像复原。实验结果表明,所提出的InstructRestore方法能够有效实现基于人类指令的图像复原,例如生成具有散景效果的图像及用户指定的局部增强。本工作推动了交互式图像复原与增强技术的研究进展。数据、代码与模型可通过https://github.com/shuaizhengliu/InstructRestore.git获取。