The confluence of Search and Recommendation services is a vital aspect of online content platforms like Kuaishou and TikTok. The integration of S&R modeling is a highly intuitive approach adopted by industry practitioners. However, there is a noticeable lack of research conducted in this area within the academia, primarily due to the absence of publicly available datasets. Consequently, a substantial gap has emerged between academia and industry regarding research endeavors in this field. To bridge this gap, we introduce the first large-scale, real-world dataset KuaiSAR of integrated Search And Recommendation behaviors collected from Kuaishou, a leading short-video app in China with over 300 million daily active users. Previous research in this field has predominantly employed publicly available datasets that are semi-synthetic and simulated, with artificially fabricated search behaviors. Distinct from previous datasets, KuaiSAR records genuine user behaviors, the occurrence of each interaction within either search or recommendation service, and the users' transitions between the two services. This work aids in joint modeling of S&R, and the utilization of search data for recommenders (and recommendation data for search engines). Additionally, due to the diverse feedback labels of user-video interactions, KuaiSAR also supports a wide range of other tasks, including intent recommendation, multi-task learning, and long sequential multi-behavior modeling etc. We believe this dataset will facilitate innovative research and enrich our understanding of S&R services integration in real-world applications.
翻译:搜索与推荐服务的融合是快手、抖音等在线内容平台的关键特性。行业从业者普遍采用搜索与推荐联合建模这一高度直观的方法。然而,学术界对该领域的研究明显不足,主要原因是缺乏公开可用的数据集。因此,在该领域的研究探索中,学术界与工业界之间出现了显著差距。为弥合这一差距,我们引入了首个大规模真实世界数据集KuaiSAR,该数据集收集自中国领先的短视频应用快手(拥有超过3亿日活跃用户),整合了搜索与推荐行为。此前该领域的研究主要采用半合成且经过模拟的公开数据集,其中包含人为构造的搜索行为。与以往数据集不同,KuaiSAR记录了用户的真实行为、搜索或推荐服务中每次交互的发生场景,以及用户在这两种服务间的转换过程。本研究有助于实现搜索与推荐的联合建模,并将搜索数据应用于推荐系统(以及将推荐数据应用于搜索引擎)。此外,由于用户与视频交互产生的反馈标签具有多样性,KuaiSAR还支持广泛的其他任务,包括意图推荐、多任务学习、长序列多行为建模等。我们相信,该数据集将推动创新研究,并深化对实际应用中搜索与推荐服务融合的理解。