SOEDiff: Efficient Distillation for Small Object Editing

In this paper, we delve into a new task known as small object editing (SOE), which focuses on text-based image inpainting within a constrained, small-sized area. Despite the remarkable success have been achieved by current image inpainting approaches, their application to the SOE task generally results in failure cases such as Object Missing, Text-Image Mismatch, and Distortion. These failures stem from the limited use of small-sized objects in training datasets and the downsampling operations employed by U-Net models, which hinders accurate generation. To overcome these challenges, we introduce a novel training-based approach, SOEDiff, aimed at enhancing the capability of baseline models like StableDiffusion in editing small-sized objects while minimizing training costs. Specifically, our method involves two key components: SO-LoRA, which efficiently fine-tunes low-rank matrices, and Cross-Scale Score Distillation loss, which leverages high-resolution predictions from the pre-trained teacher diffusion model. Our method presents significant improvements on the test dataset collected from MSCOCO and OpenImage, validating the effectiveness of our proposed method in small object editing. In particular, when comparing SOEDiff with SD-I model on the OpenImage-f dataset, we observe a 0.99 improvement in CLIP-Score and a reduction of 2.87 in FID. Our project page can be found in https://soediff.github.io/.

翻译：本文深入探讨了一项名为小目标编辑（Small Object Editing, SOE）的新任务，该任务聚焦于在受限的小尺寸区域内进行基于文本的图像修复。尽管当前的图像修复方法已取得显著成功，但将其应用于SOE任务时通常会导致诸如目标缺失、文本-图像不匹配和畸变等失败情况。这些失败源于训练数据集中小尺寸目标使用有限，以及U-Net模型采用的下采样操作阻碍了精准生成。为克服这些挑战，我们提出了一种基于训练的新方法SOEDiff，旨在增强诸如StableDiffusion等基线模型编辑小尺寸目标的能力，同时最小化训练成本。具体而言，我们的方法包含两个关键组件：SO-LoRA（高效微调低秩矩阵）和跨尺度分数蒸馏损失（利用预训练教师扩散模型的高分辨率预测）。该方法在从MSCOCO和OpenImage收集的测试数据集上取得了显著改进，验证了所提方法在小目标编辑中的有效性。特别是，将SOEDiff与SD-I模型在OpenImage-f数据集上进行比较时，我们观察到CLIP-Score提升了0.99，FID降低了2.87。项目页面请见https://soediff.github.io/。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日