Live streaming platforms have become a dominant form of online content consumption, offering dynamically evolving content, real-time interactions, and highly engaging user experiences. These unique characteristics introduce new challenges that differentiate live streaming recommendation from traditional recommendation settings and have garnered increasing attention from industry in recent years. However, research progress in academia has been hindered by the lack of publicly available datasets that accurately reflect the dynamic nature of live streaming environments. To address this gap, we introduce KuaiLive, the first real-time, interactive dataset collected from Kuaishou, a leading live streaming platform in China with over 400 million daily active users. The dataset records the interaction logs of 23,772 users and 452,621 streamers over a 21-day period. Compared to existing datasets, KuaiLive offers several advantages: it includes precise live room start and end timestamps, multiple types of real-time user interactions (click, comment, like, gift), and rich side information features for both users and streamers. These features enable more realistic simulation of dynamic candidate items and better modeling of user and streamer behaviors. We conduct a thorough analysis of KuaiLive from multiple perspectives and evaluate several representative recommendation methods on it, establishing a strong benchmark for future research. KuaiLive can support a wide range of tasks in the live streaming domain, such as top-K recommendation, click-through rate prediction, watch time prediction, and gift price prediction. Moreover, its fine-grained behavioral data also enables research on multi-behavior modeling, multi-task learning, and fairness-aware recommendation. The dataset and related resources are publicly available at https://imgkkk574.github.io/KuaiLive.
翻译:直播平台已成为在线内容消费的主导形式,呈现出动态演化的内容、实时交互和高参与度的用户体验。这些独特特征给直播推荐带来了与推荐系统传统场景不同的新挑战,近年来在业界受到越来越多的关注。然而,由于缺乏能够准确反映直播环境动态特征的开源数据集,学术领域的研究进展受到阻碍。为填补这一空白,我们提出了KuaiLive,首个来自中国领先直播平台快手(日活跃用户超4亿)的实时交互数据集。该数据集记录了23,772名用户与452,621名主播在21天内的交互日志。与现有数据集相比,KuaiLive具备多项优势:包含精确的直播间起止时间戳、多种实时用户交互行为(点击、评论、点赞、送礼),以及丰富的用户与主播侧信息特征。这些特性使得动态候选项目的仿真更加贴近实际,并有助于更好地建模用户与主播行为。我们多角度对KuaiLive进行了全面分析,并在此数据集上评估了多种代表性推荐方法,为后续研究建立了强基准。KuaiLive可支持直播领域的多种任务,如Top-K推荐、点击率预测、观看时长预测及礼物价格预测。此外,其细粒度的行为数据还可用于多行为建模、多任务学习及公平性感知推荐的研究。数据集及相关资源已在 https://imgkkk574.github.io/KuaiLive 公开提供。