The quality of experience (QoE) delivered by video conferencing systems is significantly influenced by accurately estimating the time-varying available bandwidth between the sender and receiver. Bandwidth estimation for real-time communications remains an open challenge due to rapidly evolving network architectures, increasingly complex protocol stacks, and the difficulty of defining QoE metrics that reliably improve user experience. In this work, we propose a deployed, human-in-the-loop, data-driven framework for bandwidth estimation to address these challenges. Our approach begins with training objective QoE reward models derived from subjective user evaluations to measure audio and video quality in real-time video conferencing systems. Subsequently, we collect roughly $1$M network traces with objective QoE rewards from real-world Microsoft Teams calls to curate a bandwidth estimation training dataset. We then introduce a novel distributional offline reinforcement learning (RL) algorithm to train a neural-network-based bandwidth estimator aimed at improving QoE for users. Our real-world A/B test demonstrates that the proposed approach reduces the subjective poor call ratio by $11.41\%$ compared to the baseline bandwidth estimator. Furthermore, the proposed offline RL algorithm is benchmarked on D4RL tasks to demonstrate its generalization beyond bandwidth estimation.
翻译:视频会议系统的用户体验质量在很大程度上取决于对发送端与接收端之间时变可用带宽的准确估计。由于网络架构的快速演进、协议栈日益复杂以及难以定义能可靠提升用户体验的QoE度量指标,实时通信中的带宽估计仍是一个开放难题。本研究提出一种已部署的、人机协同的、数据驱动的带宽估计框架以应对这些挑战。我们的方法首先通过主观用户评估训练客观QoE奖励模型,用以实时衡量视频会议系统中的音视频质量。随后,我们从真实世界的Microsoft Teams通话中收集约$1$M条附带客观QoE奖励的网络轨迹,构建带宽估计训练数据集。接着,我们提出一种新颖的分布式离线强化学习算法,用于训练基于神经网络的带宽估计器,旨在提升用户QoE。实际A/B测试表明,相较于基线带宽估计器,所提方法将主观差评通话率降低了$11.41\%$。此外,所提出的离线强化学习算法在D4RL任务上进行了基准测试,证明了其在带宽估计之外的泛化能力。