Live streaming plays a major role in today's digital platforms, supporting entertainment, education, social media, etc. However, research in this field is limited by the lack of large, publicly available datasets that capture real-time viewer behavior at scale. To address this gap, we introduce YTLive, a public dataset focused on YouTube Live. Collected through the YouTube Researcher Program over May and June 2024, YTLive includes more than 507000 records from 12156 live streams, tracking concurrent viewer counts at five-minute intervals along with precise broadcast durations. We describe the dataset design and collection process and present an initial analysis of temporal viewing patterns. Results show that viewer counts are higher and more stable on weekends, especially during afternoon hours. Shorter streams attract larger and more consistent audiences, while longer streams tend to grow slowly and exhibit greater variability. These insights have direct implications for adaptive streaming, resource allocation, and Quality of Experience (QoE) modeling. YTLive offers a timely, open resource to support reproducible research and system-level innovation in live streaming. The dataset is publicly available at github.
翻译:直播在当今数字平台中扮演着重要角色,支撑着娱乐、教育、社交媒体等多个领域。然而,该领域的研究受限于缺乏大规模、公开可用的数据集来捕捉实时观众行为。为填补这一空白,我们推出了YTLive,这是一个专注于YouTube直播的公开数据集。YTLive通过YouTube研究者计划于2024年5月至6月收集,包含来自12156场直播的超过507000条记录,以五分钟为间隔追踪并发观众数量,并记录精确的广播时长。我们描述了数据集的设计与收集流程,并对时序观看模式进行了初步分析。结果显示,周末(尤其是下午时段)的观众数量更高且更稳定。较短的直播流能吸引更大且更稳定的观众群体,而较长的直播流往往增长缓慢且表现出更大的波动性。这些发现对自适应流媒体、资源分配以及体验质量(QoE)建模具有直接意义。YTLive提供了一个及时、开放的资源,以支持直播领域可复现的研究与系统级创新。该数据集已在GitHub上公开提供。