In the realm of short video streaming, popular adaptive bitrate (ABR) algorithms developed for classical long video applications suffer from catastrophic failures because they are tuned to solely adapt bitrates. Instead, short video adaptive bitrate (SABR) algorithms have to properly determine which video at which bitrate level together for content prefetching, without sacrificing the users' quality of experience (QoE) and yielding noticeable bandwidth wastage jointly. Unfortunately, existing SABR methods are inevitably entangled with slow convergence and poor generalization. Thus, in this paper, we propose Incendio, a novel SABR framework that applies Multi-Agent Reinforcement Learning (MARL) with Expert Guidance to separate the decision of video ID and video bitrate in respective buffer management and bitrate adaptation agents to maximize the system-level utilized score modeled as a compound function of QoE and bandwidth wastage metrics. To train Incendio, it is first initialized by imitating the hand-crafted expert rules and then fine-tuned through the use of MARL. Results from extensive experiments indicate that Incendio outperforms the current state-of-the-art SABR algorithm with a 53.2% improvement measured by the utility score while maintaining low training complexity and inference time.
翻译:在短视频流媒体领域,为经典长视频应用设计的自适应比特率(ABR)算法因仅优化码率调整而遭遇严重失效。实际上,短视频自适应比特率(SABR)算法必须合理确定待预取内容及其对应码率等级,避免以牺牲用户体验质量(QoE)或显著浪费带宽为代价。然而,现有SABR方法普遍存在收敛缓慢与泛化性差的问题。为此,本文提出Incendio——一种新型SABR框架,通过专家引导的多智能体强化学习(MARL)将视频标识与码率选择决策分别交由缓冲管理智能体与码率自适应智能体处理,从而最大化系统级效用评分(该评分以QoE和带宽浪费指标的复合函数形式建模)。Incendio的训练分为两个阶段:首先通过模仿手工专家规则进行初始化,继而采用MARL进行微调。大量实验结果表明,Incendio在保持低训练复杂度与推理时延的前提下,相较当前最优SABR算法在效用评分上实现了53.2%的提升。