Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder - 专知论文

会员服务 ·

0

图像检索 · 情景 · FAST · 向量化 · 相互独立的 ·

2023 年 5 月 25 日

Candidate Set Re-ranking for Composed Image Retrieval with Dual Multi-modal Encoder

翻译：基于双多模态编码器的组合图像检索候选集重排序

Zheyuan Liu,Weixuan Sun,Damien Teney,Stephen Gould

from arxiv, 14 pages, including supplementary material

Composed image retrieval aims to find an image that best matches a given multi-modal user query consisting of a reference image and text pair. Existing methods commonly pre-compute image embeddings over the entire corpus and compare these to a reference image embedding modified by the query text at test time. Such a pipeline is very efficient at test time since fast vector distances can be used to evaluate candidates, but modifying the reference image embedding guided only by a short textual description can be difficult, especially independent of potential candidates. An alternative approach is to allow interactions between the query and every possible candidate, i.e., reference-text-candidate triplets, and pick the best from the entire set. Though this approach is more discriminative, for large-scale datasets the computational cost is prohibitive since pre-computation of candidate embeddings is no longer possible. We propose to combine the merits of both schemes using a two-stage model. Our first stage adopts the conventional vector distancing metric and performs a fast pruning among candidates. Meanwhile, our second stage employs a dual-encoder architecture, which effectively attends to the input triplet of reference-text-candidate and re-ranks the candidates. Both stages utilize a vision-and-language pre-trained network, which has proven beneficial for various downstream tasks. Our method consistently outperforms state-of-the-art approaches on standard benchmarks for the task.

翻译：组合图像检索旨在搜索与给定多模态用户查询（包含参考图像和文本描述对）最匹配的图像。现有方法通常预先对整个语料库的图像进行嵌入编码，并在测试时将查询文本修正后的参考图像嵌入与这些预计算嵌入进行距离比较。这种流水线测试时效率较高（可利用快速向量距离评估候选集），但仅通过简短文本描述修正参考图像嵌入存在困难，尤其当该过程独立于潜在候选集时。另一种替代方案是允许查询与每个候选图像进行交互（即参考图像-文本-候选三元组），并从全集中筛选最优结果。尽管该方法更具判别性，但因其无法预计算候选嵌入，大规模数据集的计算代价将难以承受。本文提出结合两种方案优点的两阶段模型：第一阶段采用传统向量距离度量对候选集进行快速剪枝；第二阶段基于双编码器架构，通过有效关注输入三元组（参考图像、文本、候选图像）对候选集进行重排序。两个阶段均采用经视觉与语言预训练的网络模型，该技术已被证实在多种下游任务中效果显著。在标准基准测试中，本方法性能持续超越当前最优方案。

0

相关内容

图像检索

从20世纪70年代开始，有关图像检索的研究就已开始，当时主要是基于文本的图像检索技术（Text-based Image Retrieval，简称TBIR），利用文本描述的方式描述图像的特征，如绘画作品的作者、年代、流派、尺寸等。到90年代以后，出现了对图像的内容语义，如图像的颜色、纹理、布局等进行分析和检索的图像检索技术，即基于内容的图像检索（Content-based Image Retrieval，简称CBIR）技术。CBIR属于基于内容检索（Content-based Retrieval，简称CBR）的一种，CBR中还包括对动态视频、音频等其它形式多媒体信息的检索技术。

知识荟萃

精品入门和进阶教程、论文和代码整理等

更多

查看相关VIP内容、论文、资讯等

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

miR-628促进烧伤后胰岛素抵抗和骨骼肌消耗的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

miR-140促进CYP2J2基因表达对动脉粥样硬化中血管炎症的调控作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Corin介导的ANP活化在动脉粥样硬化形成及其炎症反应中的作用与机制

国家自然科学基金

0+阅读 · 2012年12月31日

CDC73基因异常在颌骨骨化纤维瘤发病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

153Gd-DOTA-Octreotide MR/SPECT单核心双模态小分子探针构建及人肝细胞癌/肺癌裸鼠双瘤模型定量显像研究

国家自然科学基金

0+阅读 · 2012年12月31日

CDC73基因异常对颌骨骨化纤维瘤发病的作用

国家自然科学基金

0+阅读 · 2011年12月31日

炎症微环境在高血压心脏纤维化发病机制中的研究

国家自然科学基金

0+阅读 · 2011年12月31日

集卫生服务和社会服务于一体的城市老年居家卫生服务体系研究

国家自然科学基金

0+阅读 · 2009年12月31日

强非均质铀储层中压力波动地浸的流动传质机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction

Arxiv

0+阅读 · 2023年7月14日

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

Arxiv

0+阅读 · 2023年7月13日

Latent Graph Attention for Enhanced Spatial Context

Arxiv

0+阅读 · 2023年7月12日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

Semantic Models for the First-stage Retrieval: A Comprehensive Review

Arxiv

20+阅读 · 2021年9月17日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

VIP会员

文章信息

相关主题

相互独立的

最新内容

《无人机空中监控：通信实验洞察》

《无人机空中监控：通信实验洞察》

专知会员服务

0+阅读 · 3分钟前

《无全球定位系统及通信拒止环境下用于地面目标防护的分布式无人机蜂群》（含代码）

《无全球定位系统及通信拒止环境下用于地面目标防护的分布式无人机蜂群》（含代码）

专知会员服务

0+阅读 · 9分钟前

从采集到决策：美军视角下的战术情报范式重构

从采集到决策：美军视角下的战术情报范式重构

专知会员服务

11+阅读 · 8月2日

乌克兰“德尔塔”系统揭示无人机、数据与领导力如何重塑现代安全格局

乌克兰“德尔塔”系统揭示无人机、数据与领导力如何重塑现代安全格局

专知会员服务

5+阅读 · 8月2日

大规模作战中的参谋流程：作为联合兵种作战组成部分的目标锁定

大规模作战中的参谋流程：作为联合兵种作战组成部分的目标锁定

专知会员服务

9+阅读 · 8月2日

《北约概念开发与实验（CD&E）手册：概念开发者工具箱》100页手册

《北约概念开发与实验（CD&E）手册：概念开发者工具箱》100页手册

专知会员服务

10+阅读 · 8月2日

《履带式无人地面战车技术发展现状》

《履带式无人地面战车技术发展现状》

专知会员服务

5+阅读 · 8月2日

《美国空军B-2“幽灵”隐身轰炸机系统工程案例研究》117页

《美国空军B-2“幽灵”隐身轰炸机系统工程案例研究》117页

专知会员服务

9+阅读 · 8月1日

隐身技术前沿综述：物理机理、工程实践与战略展望

隐身技术前沿综述：物理机理、工程实践与战略展望

专知会员服务

7+阅读 · 8月1日

《多变海洋环境下无人水面艇与自主水下机器人对接的最优路径规划》

《多变海洋环境下无人水面艇与自主水下机器人对接的最优路径规划》

专知会员服务

7+阅读 · 8月1日

《以机反机：基于无人机载麦克风的空中周界入侵检测》

《以机反机：基于无人机载麦克风的空中周界入侵检测》

专知会员服务

7+阅读 · 8月1日

《无人机脆弱性利用：网络空间力量的新域》

《无人机脆弱性利用：网络空间力量的新域》

专知会员服务

5+阅读 · 8月1日

美空军如何将人工智能从战场部署至后方机关

美空军如何将人工智能从战场部署至后方机关

专知会员服务

13+阅读 · 7月31日

《美战争部指令文件：网络空间效应与使能能力测试评估》

《美战争部指令文件：网络空间效应与使能能力测试评估》

专知会员服务

9+阅读 · 7月31日

《史诗怒火行动：多域前瞻评估》49页报告

《史诗怒火行动：多域前瞻评估》49页报告

专知会员服务

12+阅读 · 7月31日

相关VIP内容

50+篇《神经架构搜索NAS》2020论文合集

专知会员服务

61+阅读 · 2020年3月19日

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

100+篇《自监督学习(Self-Supervised Learning)》论文最新合集

专知会员服务

167+阅读 · 2020年3月18日

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

图像分类技巧集，17页ppt《Bag of Tricks for Image Classification》

专知会员服务

96+阅读 · 2020年3月12日

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

Aspect-Oriented Syntax Network for Aspect-Based Sentiment Analysis，中山大学数据科学与计算机学院权小军教授，第八届全国社会媒体处理大会SMP2019

专知会员服务

19+阅读 · 2019年10月22日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

Connections between Support Vector Machines, Wasserstein distance and gradient-penalty GANs

专知会员服务

37+阅读 · 2019年10月17日

强化学习最新教程，17页pdf

强化学习最新教程，17页pdf

专知会员服务

182+阅读 · 2019年10月11日

[综述]深度学习下的场景文本检测与识别

[综述]深度学习下的场景文本检测与识别

专知会员服务

78+阅读 · 2019年10月10日

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

【加州大学伯克利分校博士论文】通过自我监督预测学习泛化

专知会员服务

65+阅读 · 2019年10月9日

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

【SIGGRAPH2019】TensorFlow 2.0深度学习计算机图形学应用

专知会员服务

41+阅读 · 2019年10月9日

热门VIP内容

开通专知VIP会员享更多权益服务

从采集到决策：美军视角下的战术情报范式重构

大规模作战中的参谋流程：作为联合兵种作战组成部分的目标锁定

《无全球定位系统及通信拒止环境下用于地面目标防护的分布式无人机蜂群》（含代码）

乌克兰“德尔塔”系统揭示无人机、数据与领导力如何重塑现代安全格局

相关资讯

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

强化学习的Unsupervised Meta-Learning

强化学习的Unsupervised Meta-Learning

CreateAMind

18+阅读 · 2019年1月7日

Unsupervised Learning via Meta-Learning

Unsupervised Learning via Meta-Learning

CreateAMind

44+阅读 · 2019年1月3日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

LibRec 精选：推荐系统的论文与源码

LibRec 精选：推荐系统的论文与源码

LibRec智能推荐

14+阅读 · 2018年11月29日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

【代码资源】GAN | 七份最热GAN文章及代码分享（Github 1000+Stars）

专知

13+阅读 · 2018年6月24日

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

【论文推荐】最新六篇视觉问答相关论文—深度嵌入学习、句子表征学习、深度特征聚合、3D匹配、细粒度文本摘要

专知

12+阅读 · 2018年6月9日

【推荐】自然语言处理（NLP）指南

【推荐】自然语言处理（NLP）指南

机器学习研究会

35+阅读 · 2017年11月17日

相关论文

ICL-D3IE: In-Context Learning with Diverse Demonstrations Updating for Document Information Extraction

Arxiv

0+阅读 · 2023年7月14日

Domain-Agnostic Tuning-Encoder for Fast Personalization of Text-To-Image Models

Arxiv

0+阅读 · 2023年7月13日

Latent Graph Attention for Enhanced Spatial Context

Arxiv

0+阅读 · 2023年7月12日

Conditional Prompt Learning for Vision-Language Models

Conditional Prompt Learning for Vision-Language Models

Arxiv

13+阅读 · 2022年3月10日

Updating Embeddings for Dynamic Knowledge Graphs

Arxiv

20+阅读 · 2021年9月22日

Semantic Models for the First-stage Retrieval: A Comprehensive Review

Arxiv

20+阅读 · 2021年9月17日

End-to-End Video Instance Segmentation with Transformers

Arxiv

10+阅读 · 2021年3月24日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Detect-to-Retrieve: Efficient Regional Aggregation for Image Search

Arxiv

15+阅读 · 2018年12月4日

DeepSeek: Content Based Image Search & Retrieval

Arxiv

13+阅读 · 2018年1月11日

相关基金

miR-628促进烧伤后胰岛素抵抗和骨骼肌消耗的机制研究

国家自然科学基金

0+阅读 · 2014年12月31日

miR-140促进CYP2J2基因表达对动脉粥样硬化中血管炎症的调控作用及机制研究

国家自然科学基金

0+阅读 · 2013年12月31日

Corin介导的ANP活化在动脉粥样硬化形成及其炎症反应中的作用与机制

国家自然科学基金

0+阅读 · 2012年12月31日

CDC73基因异常在颌骨骨化纤维瘤发病中的作用

国家自然科学基金

0+阅读 · 2012年12月31日

153Gd-DOTA-Octreotide MR/SPECT单核心双模态小分子探针构建及人肝细胞癌/肺癌裸鼠双瘤模型定量显像研究

国家自然科学基金

0+阅读 · 2012年12月31日

CDC73基因异常对颌骨骨化纤维瘤发病的作用

国家自然科学基金

0+阅读 · 2011年12月31日

炎症微环境在高血压心脏纤维化发病机制中的研究

国家自然科学基金

0+阅读 · 2011年12月31日

集卫生服务和社会服务于一体的城市老年居家卫生服务体系研究

国家自然科学基金

0+阅读 · 2009年12月31日

强非均质铀储层中压力波动地浸的流动传质机理研究

国家自然科学基金

0+阅读 · 2009年12月31日

磁性Pickering乳液界面流变学研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员