Improving Pinterest Search Relevance Using Large Language Models

To improve relevance scoring on Pinterest Search, we integrate Large Language Models (LLMs) into our search relevance model, leveraging carefully designed text representations to predict the relevance of Pins effectively. Our approach uses search queries alongside content representations that include captions extracted from a generative visual language model. These are further enriched with link-based text data, historically high-quality engaged queries, user-curated boards, Pin titles and Pin descriptions, creating robust models for predicting search relevance. We use a semi-supervised learning approach to efficiently scale up the amount of training data, expanding beyond the expensive human labeled data available. By utilizing multilingual LLMs, our system extends training data to include unseen languages and domains, despite initial data and annotator expertise being confined to English. Furthermore, we distill from the LLM-based model into real-time servable model architectures and features. We provide comprehensive offline experimental validation for our proposed techniques and demonstrate the gains achieved through the final deployed system at scale.

翻译：为改进Pinterest搜索的相关性排序，我们将大型语言模型（LLMs）集成至搜索相关性模型中，通过精心设计的文本表征有效预测Pin内容的关联度。该方法结合搜索查询与内容表征，后者包含从生成式视觉语言模型提取的标题信息，并进一步融合基于链接的文本数据、历史高质量互动查询、用户策展的画板、Pin标题与Pin描述，从而构建预测搜索相关性的稳健模型。我们采用半监督学习方法高效扩展训练数据规模，突破昂贵人工标注数据的限制。通过运用多语言LLMs，系统将训练数据扩展至未见的语言与领域，尽管初始数据与标注者专业能力仅限于英语。此外，我们将基于LLM的模型蒸馏为可实时服务的模型架构与特征。对所提技术进行了全面的离线实验验证，并通过最终大规模部署系统展示了性能提升。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

Graph Transformer近期进展

专知会员服务

65+阅读 · 2023年1月5日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日