Rate control algorithms are at the heart of video conferencing platforms, determining target bitrates that match dynamic network characteristics for high quality. Recent data-driven strategies have shown promise for this challenging task, but the performance degradation they introduce during training has been a nonstarter for many production services, precluding adoption. This paper aims to bolster the practicality of data-driven rate control by presenting an alternative avenue for experiential learning: leveraging purely existing telemetry logs produced by the incumbent algorithm in production. We observe that these logs contain effective decisions, although often at the wrong times or in the wrong order. To realize this approach despite the inherent uncertainty that log-based learning brings (i.e., lack of feedback for new decisions), our system, Tarzan, combines a variety of robust learning techniques (i.e., conservatively reasoning about alternate behavior to minimize risk and using a richer model formulation to account for environmental noise). Across diverse networks (emulated and real-world), Tarzan outperforms the widely deployed GCC algorithm, increasing average video bitrates by 15-39% while reducing freeze rates by 60-100%.
翻译:码率控制算法是视频会议平台的核心,它通过确定与动态网络特性相匹配的目标码率来保障高质量传输。近年来,数据驱动策略在这一挑战性任务中展现出潜力,但其在训练阶段引入的性能下降成为阻碍许多生产服务采用的致命缺陷,导致无法实际部署。本文旨在通过提出一种替代性的经验学习路径来增强数据驱动码率控制的实用性:仅利用生产环境中现有算法产生的遥测日志进行学习。我们发现,这些日志中虽包含有效的决策,但其出现时机或顺序往往存在偏差。为应对日志学习固有的不确定性(即缺乏对新决策的反馈)以实现该方法,我们提出的系统Tarzan结合了多种鲁棒学习技术(包括对替代行为进行保守推理以最小化风险,以及采用更丰富的模型公式以应对环境噪声)。在多样化网络环境(仿真与真实场景)的测试中,Tarzan均优于广泛部署的GCC算法,将平均视频码率提升15-39%,同时将卡顿率降低60-100%。