In this paper, we delve into a new task known as small object editing (SOE), which focuses on text-based image inpainting within a constrained, small-sized area. Despite the remarkable success have been achieved by current image inpainting approaches, their application to the SOE task generally results in failure cases such as Object Missing, Text-Image Mismatch, and Distortion. These failures stem from the limited use of small-sized objects in training datasets and the downsampling operations employed by U-Net models, which hinders accurate generation. To overcome these challenges, we introduce a novel training-based approach, SOEDiff, aimed at enhancing the capability of baseline models like StableDiffusion in editing small-sized objects while minimizing training costs. Specifically, our method involves two key components: SO-LoRA, which efficiently fine-tunes low-rank matrices, and Cross-Scale Score Distillation loss, which leverages high-resolution predictions from the pre-trained teacher diffusion model. Our method presents significant improvements on the test dataset collected from MSCOCO and OpenImage, validating the effectiveness of our proposed method in small object editing. In particular, when comparing SOEDiff with SD-I model on the OpenImage-f dataset, we observe a 0.99 improvement in CLIP-Score and a reduction of 2.87 in FID. Our project page can be found in https://soediff.github.io/.
翻译:本文深入探讨了一项名为小目标编辑(Small Object Editing, SOE)的新任务,该任务聚焦于在受限的小尺寸区域内进行基于文本的图像修复。尽管当前的图像修复方法已取得显著成功,但将其应用于SOE任务时通常会导致诸如目标缺失、文本-图像不匹配和畸变等失败情况。这些失败源于训练数据集中小尺寸目标使用有限,以及U-Net模型采用的下采样操作阻碍了精准生成。为克服这些挑战,我们提出了一种基于训练的新方法SOEDiff,旨在增强诸如StableDiffusion等基线模型编辑小尺寸目标的能力,同时最小化训练成本。具体而言,我们的方法包含两个关键组件:SO-LoRA(高效微调低秩矩阵)和跨尺度分数蒸馏损失(利用预训练教师扩散模型的高分辨率预测)。该方法在从MSCOCO和OpenImage收集的测试数据集上取得了显著改进,验证了所提方法在小目标编辑中的有效性。特别是,将SOEDiff与SD-I模型在OpenImage-f数据集上进行比较时,我们观察到CLIP-Score提升了0.99,FID降低了2.87。项目页面请见https://soediff.github.io/。