KuaiSAR: A Unified Search And Recommendation Dataset

The confluence of Search and Recommendation (S&R) services is vital to online services, including e-commerce and video platforms. The integration of S&R modeling is a highly intuitive approach adopted by industry practitioners. However, there is a noticeable lack of research conducted in this area within academia, primarily due to the absence of publicly available datasets. Consequently, a substantial gap has emerged between academia and industry regarding research endeavors in joint optimization using user behavior data from both S&R services. To bridge this gap, we introduce the first large-scale, real-world dataset KuaiSAR of integrated Search And Recommendation behaviors collected from Kuaishou, a leading short-video app in China with over 350 million daily active users. Previous research in this field has predominantly employed publicly available semi-synthetic datasets and simulated, with artificially fabricated search behaviors. Distinct from previous datasets, KuaiSAR contains genuine user behaviors, including the occurrence of each interaction within either search or recommendation service, and the users' transitions between the two services. This work aids in joint modeling of S&R, and utilizing search data for recommender systems (and recommendation data for search engines). Furthermore, due to the various feedback labels associated with user-video interactions, KuaiSAR also supports a broad range of tasks, including intent recommendation, multi-task learning, and modeling of long sequential multi-behavioral patterns. We believe this dataset will serve as a catalyst for innovative research and bridge the gap between academia and industry in understanding the S&R services in practical, real-world applications.

翻译：搜索与推荐服务的融合对包括电子商务和视频平台在内的在线服务至关重要。业界从业者普遍采用搜索与推荐建模的集成方法，然而学术界在该领域的研究明显不足，主要原因是缺乏公开可用的数据集。因此，在利用搜索与推荐服务的用户行为数据进行联合优化的研究方面，学术界与工业界之间出现了显著差距。为弥补这一鸿沟，我们首次提出大规模真实世界数据集KuaiSAR，该数据集整合了来自快手的搜索与推荐行为——快手是中国领先的短视频应用，拥有超过3.5亿日活跃用户。以往该领域的研究主要采用公开的半合成数据集及模拟方法，并包含人工构建的搜索行为。与先前数据集不同，KuaiSAR包含真实用户行为，涵盖搜索或推荐服务中每次交互的发生情况，以及用户在两种服务间的转换。这项工作有助于对搜索与推荐进行联合建模，并利用搜索数据优化推荐系统（以及利用推荐数据优化搜索引擎）。此外，由于用户与视频交互关联多种反馈标签，KuaiSAR还支持广泛的任务，包括意图推荐、多任务学习以及长序列多行为模式建模。我们相信，该数据集将激发创新研究，并弥合学术界与工业界在实际应用场景中理解搜索与推荐服务的差距。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日