The confluence of Search and Recommendation (S&R) services is vital to online services, including e-commerce and video platforms. The integration of S&R modeling is a highly intuitive approach adopted by industry practitioners. However, there is a noticeable lack of research conducted in this area within academia, primarily due to the absence of publicly available datasets. Consequently, a substantial gap has emerged between academia and industry regarding research endeavors in joint optimization using user behavior data from both S&R services. To bridge this gap, we introduce the first large-scale, real-world dataset KuaiSAR of integrated Search And Recommendation behaviors collected from Kuaishou, a leading short-video app in China with over 350 million daily active users. Previous research in this field has predominantly employed publicly available semi-synthetic datasets and simulated, with artificially fabricated search behaviors. Distinct from previous datasets, KuaiSAR contains genuine user behaviors, including the occurrence of each interaction within either search or recommendation service, and the users' transitions between the two services. This work aids in joint modeling of S&R, and utilizing search data for recommender systems (and recommendation data for search engines). Furthermore, due to the various feedback labels associated with user-video interactions, KuaiSAR also supports a broad range of tasks, including intent recommendation, multi-task learning, and modeling of long sequential multi-behavioral patterns. We believe this dataset will serve as a catalyst for innovative research and bridge the gap between academia and industry in understanding the S&R services in practical, real-world applications.
翻译:搜索与推荐服务的融合对包括电子商务和视频平台在内的在线服务至关重要。业界从业者普遍采用搜索与推荐建模的集成方法,然而学术界在该领域的研究明显不足,主要原因是缺乏公开可用的数据集。因此,在利用搜索与推荐服务的用户行为数据进行联合优化的研究方面,学术界与工业界之间出现了显著差距。为弥补这一鸿沟,我们首次提出大规模真实世界数据集KuaiSAR,该数据集整合了来自快手的搜索与推荐行为——快手是中国领先的短视频应用,拥有超过3.5亿日活跃用户。以往该领域的研究主要采用公开的半合成数据集及模拟方法,并包含人工构建的搜索行为。与先前数据集不同,KuaiSAR包含真实用户行为,涵盖搜索或推荐服务中每次交互的发生情况,以及用户在两种服务间的转换。这项工作有助于对搜索与推荐进行联合建模,并利用搜索数据优化推荐系统(以及利用推荐数据优化搜索引擎)。此外,由于用户与视频交互关联多种反馈标签,KuaiSAR还支持广泛的任务,包括意图推荐、多任务学习以及长序列多行为模式建模。我们相信,该数据集将激发创新研究,并弥合学术界与工业界在实际应用场景中理解搜索与推荐服务的差距。