Reframing Image Difference Captioning with BLIP2IDC and Synthetic Augmentation

The rise of the generative models quality during the past years enabled the generation of edited variations of images at an important scale. To counter the harmful effects of such technology, the Image Difference Captioning (IDC) task aims to describe the differences between two images. While this task is successfully handled for simple 3D rendered images, it struggles on real-world images. The reason is twofold: the training data-scarcity, and the difficulty to capture fine-grained differences between complex images. To address those issues, we propose in this paper a simple yet effective framework to both adapt existing image captioning models to the IDC task and augment IDC datasets. We introduce BLIP2IDC, an adaptation of BLIP2 to the IDC task at low computational cost, and show it outperforms two-streams approaches by a significant margin on real-world IDC datasets. We also propose to use synthetic augmentation to improve the performance of IDC models in an agnostic fashion. We show that our synthetic augmentation strategy provides high quality data, leading to a challenging new dataset well-suited for IDC named Syned1.

翻译：过去几年中，生成模型质量的提升使得大规模生成图像的编辑变体成为可能。为应对此类技术可能带来的负面影响，图像差异描述任务旨在描述两幅图像之间的差异。尽管该任务在简单的三维渲染图像上已能成功处理，但在真实世界图像上仍面临困难。其原因有二：训练数据稀缺，以及难以捕捉复杂图像间的细粒度差异。为解决这些问题，本文提出一个简单而有效的框架，既能将现有图像描述模型适配至IDC任务，又能增强IDC数据集。我们提出了BLIP2IDC——一种以较低计算成本将BLIP2适配至IDC任务的方法，并证明其在真实世界IDC数据集上显著优于双流方法。我们还提出以任务无关的方式利用合成增强来提升IDC模型的性能。实验表明，我们的合成增强策略能生成高质量数据，由此构建了一个适用于IDC的、具有挑战性的新数据集Syned1。

相关内容

IDC

关注 6

Interaction Design and Children是研究人员、教育工作者和实践者的首次国际会议，旨在分享包容性儿童中心设计、学习和互动领域的最新研究成果、创新方法和新技术。年会包括论文、专题介绍、发言者、讲习班、参与性设计经验以及讨论如何为儿童创造更好的互动经验。官网链接：http://idc.acm.org/2019/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日