Delving into E-Commerce Product Retrieval with Vision-Language Pre-training - 专知论文

会员服务 ·

0

视觉语言预训练 · 电子商务产品 · 电子商务 · 预训练 · 淘宝搜索 ·

2023 年 4 月 10 日

Delving into E-Commerce Product Retrieval with Vision-Language Pre-training

翻译：深入探究基于视觉-语言预训练的商品检索技术

Xiaoyang Zheng,Fuyu Lv,Zilong Wang,Qingwen Liu,Xiaoyi Zeng

from arxiv, 5 pages, 4 figures, accepted to SIRIP 2023

E-commerce search engines comprise a retrieval phase and a ranking phase, where the first one returns a candidate product set given user queries. Recently, vision-language pre-training, combining textual information with visual clues, has been popular in the application of retrieval tasks. In this paper, we propose a novel V+L pre-training method to solve the retrieval problem in Taobao Search. We design a visual pre-training task based on contrastive learning, outperforming common regression-based visual pre-training tasks. In addition, we adopt two negative sampling schemes, tailored for the large-scale retrieval task. Besides, we introduce the details of the online deployment of our proposed method in real-world situations. Extensive offline/online experiments demonstrate the superior performance of our method on the retrieval task. Our proposed method is employed as one retrieval channel of Taobao Search and serves hundreds of millions of users in real time.

翻译：电子商务搜索引擎包含检索和排序两个阶段，其中第一阶段根据用户查询返回候选产品集合。近年来，结合文本信息与视觉线索的视觉-语言预训练技术已广泛应用于检索任务中。本文针对淘宝搜索的检索问题，提出一种新颖的视觉-语言（V+L）预训练方法。我们设计了一种基于对比学习的视觉预训练任务，其性能优于常见的基于回归的视觉预训练任务。此外，我们针对大规模检索任务采用了两种负采样方案。同时详细介绍了所提方法在真实场景中的线上部署细节。大量离线/在线实验表明，我们的方法在检索任务上具有卓越性能。目前该方法已作为淘宝搜索的一个检索通道，实时服务于数亿用户。

0

相关内容

视觉语言预训练

视觉语言预训练

【AAAI2023】基于检索增强语言模型的高效可扩展NLP，72页ppt

【AAAI2023】基于检索增强语言模型的高效可扩展NLP，72页ppt

专知会员服务

57+阅读 · 2023年2月20日

【CVPR2022】跨模态检索的协同双流视觉语言预训练模型

【CVPR2022】跨模态检索的协同双流视觉语言预训练模型

专知会员服务

21+阅读 · 2022年4月21日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

多模态信息如何嵌入推荐系统？RecSys2021《多模态推荐系统》教程，103页ppt讲述文本、图像与图形多模态信息利用

多模态信息如何嵌入推荐系统？RecSys2021《多模态推荐系统》教程，103页ppt讲述文本、图像与图形多模态信息利用

专知会员服务

96+阅读 · 2021年10月1日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

专知会员服务

44+阅读 · 2020年5月3日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新八篇推荐系统相关论文—亿级商品嵌入、主动学习、树深度模型、知识图谱、注意力感知、矩阵分解、神经个性化嵌入

【论文推荐】最新八篇推荐系统相关论文—亿级商品嵌入、主动学习、树深度模型、知识图谱、注意力感知、矩阵分解、神经个性化嵌入

专知

15+阅读 · 2018年6月15日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

云计算中高效安全外包计算协议的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Ontology的藏文语料库检索关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

通用Web结构化信息检索引擎的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

语义驱动的个性化虚拟人重建技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

适应云计算环境的视频编码、传输与智能处理

国家自然科学基金

0+阅读 · 2011年12月31日

三维模型的分析与检索研究

国家自然科学基金

1+阅读 · 2010年12月31日

三维模型语义分析与检索研究

国家自然科学基金

2+阅读 · 2008年12月31日

面向查询的XML文本自动文摘研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于双语文档反馈的跨语言信息检索研究

国家自然科学基金

0+阅读 · 2008年12月31日

Visually-augmented pretrained language models for NLP tasks without images

Arxiv

0+阅读 · 2023年5月26日

Multiview Identifiers Enhanced Generative Retrieval

Arxiv

0+阅读 · 2023年5月26日

ConvGQR: Generative Query Reformulation for Conversational Search

ConvGQR: Generative Query Reformulation for Conversational Search

Arxiv

0+阅读 · 2023年5月26日

Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval

Arxiv

0+阅读 · 2023年5月24日

Decomposing Complex Queries for Tip-of-the-tongue Retrieval

Arxiv

0+阅读 · 2023年5月24日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

VIP会员

文章信息

相关主题

视觉语言预训练

电子商务产品

最新内容

美国从乌克兰无人机战争中学习经验

美国从乌克兰无人机战争中学习经验

专知会员服务

1+阅读 · 今天15:03

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

专知会员服务

0+阅读 · 今天14:31

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

专知会员服务

0+阅读 · 今天14:29

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

专知会员服务

12+阅读 · 6月20日

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

ICML 2026 | 多任务贝叶斯上下文学习：让 Transformer 在测试时显式适应新先验

专知会员服务

4+阅读 · 6月19日

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

ACL 2026综述 | 大规模手语数据集：资源、基准与标注标准

专知会员服务

7+阅读 · 6月19日

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

ICML 2026 Spotlight | SmoothSMoE：解析稀疏 MoE 路由不连续

专知会员服务

6+阅读 · 6月18日

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

综述 | 周期表视角下的大模型推理：范式、方法与失败模式

专知会员服务

8+阅读 · 6月18日

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

《廉价自杀式无人机战争的军事战略影响：乌克兰和伊朗案例研究》

专知会员服务

11+阅读 · 6月18日

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

《面向反无人机作战的联邦式可解释射频–光电/红外情报融合：边缘人工智能优化、电子战韧性及分布式监视验证》

专知会员服务

11+阅读 · 6月18日

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

ICML 2026 | FR3D：解耦自车运动的未来动态三维重建世界模型

专知会员服务

7+阅读 · 6月17日

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

【伯克利博士论文】迈向可扩展与自我演进的大语言模型智能体

专知会员服务

12+阅读 · 6月17日

学习数据的几何：形状空间分析数学综述

学习数据的几何：形状空间分析数学综述

专知会员服务

8+阅读 · 6月17日

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

《现代防空系统综述：架构、传感器、拦截器及新兴威胁环境对基础设施受限防御环境的影响》2026最新长综述

专知会员服务

21+阅读 · 6月17日

定向能反无人机系统最新发展动态

定向能反无人机系统最新发展动态

专知会员服务

10+阅读 · 6月17日

相关VIP内容

【AAAI2023】基于检索增强语言模型的高效可扩展NLP，72页ppt

【AAAI2023】基于检索增强语言模型的高效可扩展NLP，72页ppt

专知会员服务

57+阅读 · 2023年2月20日

【CVPR2022】跨模态检索的协同双流视觉语言预训练模型

【CVPR2022】跨模态检索的协同双流视觉语言预训练模型

专知会员服务

21+阅读 · 2022年4月21日

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

【CVPR 2022】跨模态检索的协同双流视觉-语言前训练模型，COTS: Collaborative Two-Stream Vision-Language Pre-Training Model for Cross-Modal Retrieval

专知会员服务

13+阅读 · 2022年3月12日

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

【CVPR 2022】多模态视频字幕的端到端生成预训练，End-to-end Generative Pretraining for Multimodal Video Captioning

专知会员服务

27+阅读 · 2022年3月3日

多模态信息如何嵌入推荐系统？RecSys2021《多模态推荐系统》教程，103页ppt讲述文本、图像与图形多模态信息利用

多模态信息如何嵌入推荐系统？RecSys2021《多模态推荐系统》教程，103页ppt讲述文本、图像与图形多模态信息利用

专知会员服务

96+阅读 · 2021年10月1日

Linux导论，Introduction to Linux，96页ppt

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

【论文推荐】自然语言处理与查询扩展综述，Natural Language Processing and Query Expansion

专知会员服务

44+阅读 · 2020年5月3日

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

【Google ICLR2020论文】嵌入式大规模检索的预训练任务，Pre-training Tasks for Embedding-based Large-scale Retrieval

专知会员服务

28+阅读 · 2020年2月12日

【NLP| 推荐文章】基于文本和知识库的语义搜索（Semantic search on text and knowledge bases）

专知会员服务

46+阅读 · 2019年11月24日

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

【CCL 2019】如何微调BERT进行文本分类？（How to Fine-Tune BERT for Text Classification?）

专知会员服务

84+阅读 · 2019年10月18日

热门VIP内容

开通专知VIP会员享更多权益服务

ICML 2026 | 面向视觉语言模型的语义鲁棒性认证

深入解读 Palantir AIP：全球最具争议的人工智能平台究竟如何运作

美国从乌克兰无人机战争中学习经验

综述 | 智能体电子设计自动化：从“交接有效性”重新理解Agentic EDA

相关资讯

VCIP 2022 Call for Demos

VCIP 2022 Call for Demos

CCF多媒体专委会

1+阅读 · 2022年6月6日

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

征稿 | International Joint Conference on Knowledge Graphs (IJCKG)

开放知识图谱

2+阅读 · 2022年5月20日

Hierarchically Structured Meta-learning

Hierarchically Structured Meta-learning

CreateAMind

27+阅读 · 2019年5月22日

Transferring Knowledge across Learning Processes

Transferring Knowledge across Learning Processes

CreateAMind

29+阅读 · 2019年5月18日

A Technical Overview of AI & ML in 2018 & Trends for 2019

A Technical Overview of AI & ML in 2018 & Trends for 2019

待字闺中

18+阅读 · 2018年12月24日

disentangled-representation-papers

disentangled-representation-papers

CreateAMind

26+阅读 · 2018年9月12日

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

【论文推荐】最新八篇情感分析相关论文—Pair-wise判别器、多模态情感分析、上下文语境、Gated 卷积网络

专知

20+阅读 · 2018年6月29日

【论文推荐】最新八篇推荐系统相关论文—亿级商品嵌入、主动学习、树深度模型、知识图谱、注意力感知、矩阵分解、神经个性化嵌入

【论文推荐】最新八篇推荐系统相关论文—亿级商品嵌入、主动学习、树深度模型、知识图谱、注意力感知、矩阵分解、神经个性化嵌入

专知

15+阅读 · 2018年6月15日

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

【论文推荐】最新八篇图像检索相关论文—三元组、深度特征图、判别式、卷积特征聚合、视觉-关系知识图谱、大规模图像检索

专知

33+阅读 · 2018年4月23日

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

【论文推荐】最新7篇视觉问答（VQA）相关论文—解释、读写记忆网络、逆视觉问答、视觉推理、可解释性、注意力机制、计数

专知

30+阅读 · 2018年3月22日

相关论文

Visually-augmented pretrained language models for NLP tasks without images

Arxiv

0+阅读 · 2023年5月26日

Multiview Identifiers Enhanced Generative Retrieval

Arxiv

0+阅读 · 2023年5月26日

ConvGQR: Generative Query Reformulation for Conversational Search

ConvGQR: Generative Query Reformulation for Conversational Search

Arxiv

0+阅读 · 2023年5月26日

Aggretriever: A Simple Approach to Aggregate Textual Representations for Robust Dense Passage Retrieval

Arxiv

0+阅读 · 2023年5月24日

Decomposing Complex Queries for Tip-of-the-tongue Retrieval

Arxiv

0+阅读 · 2023年5月24日

Understanding and Constructing Latent Modality Structures in Multi-modal Representation Learning

Arxiv

11+阅读 · 2023年3月10日

Pre-training Methods in Information Retrieval

Arxiv

16+阅读 · 2021年11月27日

PROP: Pre-training with Representative Words Prediction for Ad-hoc Retrieval

Arxiv

11+阅读 · 2020年10月20日

Embedding-based Retrieval in Facebook Search

Arxiv

12+阅读 · 2020年6月20日

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

UniViLM: A Unified Video and Language Pre-Training Model for Multimodal Understanding and Generation

Arxiv

19+阅读 · 2020年2月15日

相关基金

云计算中高效安全外包计算协议的研究

国家自然科学基金

0+阅读 · 2012年12月31日

基于Ontology的藏文语料库检索关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

通用Web结构化信息检索引擎的关键技术研究

国家自然科学基金

0+阅读 · 2012年12月31日

语义驱动的个性化虚拟人重建技术研究

国家自然科学基金

2+阅读 · 2012年12月31日

跨语言信息检索中的机器翻译研究

国家自然科学基金

2+阅读 · 2011年12月31日

适应云计算环境的视频编码、传输与智能处理

国家自然科学基金

0+阅读 · 2011年12月31日

三维模型的分析与检索研究

国家自然科学基金

1+阅读 · 2010年12月31日

三维模型语义分析与检索研究

国家自然科学基金

2+阅读 · 2008年12月31日

面向查询的XML文本自动文摘研究

国家自然科学基金

0+阅读 · 2008年12月31日

基于双语文档反馈的跨语言信息检索研究

国家自然科学基金

0+阅读 · 2008年12月31日

微信扫码咨询专知VIP会员