KuaiSAR: A Unified Search And Recommendation Dataset

The confluence of Search and Recommendation (S&R) services is vital to online services, including e-commerce and video platforms. The integration of S&R modeling is a highly intuitive approach adopted by industry practitioners. However, there is a noticeable lack of research conducted in this area within academia, primarily due to the absence of publicly available datasets. Consequently, a substantial gap has emerged between academia and industry regarding research endeavors in joint optimization using user behavior data from both S&R services. To bridge this gap, we introduce the first large-scale, real-world dataset KuaiSAR of integrated Search And Recommendation behaviors collected from Kuaishou, a leading short-video app in China with over 350 million daily active users. Previous research in this field has predominantly employed publicly available semi-synthetic datasets and simulated, with artificially fabricated search behaviors. Distinct from previous datasets, KuaiSAR contains genuine user behaviors, including the occurrence of each interaction within either search or recommendation service, and the users' transitions between the two services. This work aids in joint modeling of S&R, and utilizing search data for recommender systems (and recommendation data for search engines). Furthermore, due to the various feedback labels associated with user-video interactions, KuaiSAR also supports a broad range of tasks, including intent recommendation, multi-task learning, and modeling of long sequential multi-behavioral patterns. We believe this dataset will serve as a catalyst for innovative research and bridge the gap between academia and industry in understanding the S&R services in practical, real-world applications.

翻译：搜索与推荐服务的融合对包括电商和视频平台在内的在线服务至关重要。搜索与推荐的联合建模是业界广泛采用的直观方法。然而，学术界在该领域的研究明显不足，主要原因是缺乏公开可用的数据集。因此，在利用搜索与推荐双服务的用户行为数据进行联合优化的研究方面，学术界与工业界之间存在巨大鸿沟。为弥合这一差距，我们首次发布了基于快手（中国领先的短视频应用，日活跃用户超3.5亿）真实搜索与推荐行为的大规模数据集KuaiSAR。此前该领域研究主要采用半合成数据集并模拟人工生成的搜索行为。与以往数据集不同，KuaiSAR包含真实的用户行为，涵盖搜索或推荐服务中的每次交互以及用户在两种服务之间的切换行为。该工作有助于实现搜索与推荐的联合建模，并利用搜索数据优化推荐系统（反之亦然）。此外，由于用户-视频交互包含多种反馈标签，KuaiSAR还支持意图推荐、多任务学习以及长序列多行为模式建模等广泛任务。我们相信该数据集将催化创新研究，并弥合学术界与工业界在理解现实应用中搜索与推荐服务方面的鸿沟。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日