The confluence of Search and Recommendation (S&R) services is vital to online services, including e-commerce and video platforms. The integration of S&R modeling is a highly intuitive approach adopted by industry practitioners. However, there is a noticeable lack of research conducted in this area within academia, primarily due to the absence of publicly available datasets. Consequently, a substantial gap has emerged between academia and industry regarding research endeavors in joint optimization using user behavior data from both S&R services. To bridge this gap, we introduce the first large-scale, real-world dataset KuaiSAR of integrated Search And Recommendation behaviors collected from Kuaishou, a leading short-video app in China with over 350 million daily active users. Previous research in this field has predominantly employed publicly available semi-synthetic datasets and simulated, with artificially fabricated search behaviors. Distinct from previous datasets, KuaiSAR contains genuine user behaviors, including the occurrence of each interaction within either search or recommendation service, and the users' transitions between the two services. This work aids in joint modeling of S&R, and utilizing search data for recommender systems (and recommendation data for search engines). Furthermore, due to the various feedback labels associated with user-video interactions, KuaiSAR also supports a broad range of tasks, including intent recommendation, multi-task learning, and modeling of long sequential multi-behavioral patterns. We believe this dataset will serve as a catalyst for innovative research and bridge the gap between academia and industry in understanding the S&R services in practical, real-world applications.
翻译:搜索与推荐服务的融合对包括电商和视频平台在内的在线服务至关重要。搜索与推荐的联合建模是业界广泛采用的直观方法。然而,学术界在该领域的研究明显不足,主要原因是缺乏公开可用的数据集。因此,在利用搜索与推荐双服务的用户行为数据进行联合优化的研究方面,学术界与工业界之间存在巨大鸿沟。为弥合这一差距,我们首次发布了基于快手(中国领先的短视频应用,日活跃用户超3.5亿)真实搜索与推荐行为的大规模数据集KuaiSAR。此前该领域研究主要采用半合成数据集并模拟人工生成的搜索行为。与以往数据集不同,KuaiSAR包含真实的用户行为,涵盖搜索或推荐服务中的每次交互以及用户在两种服务之间的切换行为。该工作有助于实现搜索与推荐的联合建模,并利用搜索数据优化推荐系统(反之亦然)。此外,由于用户-视频交互包含多种反馈标签,KuaiSAR还支持意图推荐、多任务学习以及长序列多行为模式建模等广泛任务。我们相信该数据集将催化创新研究,并弥合学术界与工业界在理解现实应用中搜索与推荐服务方面的鸿沟。