Learning-based congestion control (CC), including Reinforcement-Learning, promises efficient CC in a fast-changing networking landscape, where evolving communication technologies, applications and traffic workloads pose severe challenges to human-derived, static CC algorithms. Learning-based CC is in its early days and substantial research is required to understand existing limitations, identify research challenges and, eventually, yield deployable solutions for real-world networks. In this paper, we extend our prior work and present a reproducible and systematic study of learning-based CC with the aim to highlight strengths and uncover fundamental limitations of the state-of-the-art. We directly contrast said approaches with widely deployed, human-derived CC algorithms, namely TCP Cubic and BBR (version 3). We identify challenges in evaluating learning-based CC, establish a methodology for studying said approaches and perform large-scale experimentation with learning-based CC approaches that are publicly available. We show that embedding fairness directly into reward functions is effective; however, the fairness properties do not generalise into unseen conditions. We then show that RL learning-based approaches existing approaches can acquire all available bandwidth while largely maintaining low latency. Finally, we highlight that existing the latest learning-based CC approaches under-perform when the available bandwidth and end-to-end latency dynamically change while remaining resistant to non-congestive loss. As with our initial study, our experimentation codebase and datasets are publicly available with the aim to galvanise the research community towards transparency and reproducibility, which have been recognised as crucial for researching and evaluating machine-generated policies.
翻译:基于学习的拥塞控制(包括强化学习)有望在快速变化的网络环境中实现高效拥塞控制,其中不断演进的通信技术、应用及流量负载对人类设计的静态拥塞控制算法构成了严峻挑战。基于学习的拥塞控制尚处于早期阶段,需要进行大量研究以理解现有局限性、识别研究挑战,并最终为现实网络提供可部署的解决方案。本文在前期工作基础上,对基于学习的拥塞控制开展了可重复的系统性研究,旨在揭示现有方法的优势与根本局限性。我们将这些方法与广泛部署的人工设计拥塞控制算法(即TCP Cubic和BBR第3版)进行直接对比。我们识别了评估基于学习拥塞控制的挑战,建立了研究这些方法的方法论,并对公开可用的基于学习拥塞控制方法进行了大规模实验。研究表明,将公平性直接嵌入奖励函数是有效的,但其公平性特性无法泛化到未见场景。进一步地,我们发现现有的基于强化学习的方法能够在基本保持低延迟的同时获取所有可用带宽。最后,我们指出现有最新的基于学习的拥塞控制方法在可用带宽和端到端延迟动态变化时性能不佳,但对非拥塞丢包具有鲁棒性。与初步研究一致,我们的实验代码库和数据集均公开可用,旨在激励研究社区追求透明度和可重复性——这两点已被公认为研究和评估机器学习生成策略的关键要素。