Recent years have witnessed a plethora of learning-based solutions for congestion control (CC) that demonstrate better performance over traditional TCP schemes. However, they fail to provide consistently good convergence properties, including {\em fairness}, {\em fast convergence} and {\em stability}, due to the mismatch between their objective functions and these properties. Despite being intuitive, integrating these properties into existing learning-based CC is challenging, because: 1) their training environments are designed for the performance optimization of single flow but incapable of cooperative multi-flow optimization, and 2) there is no directly measurable metric to represent these properties into the training objective function. We present Astraea, a new learning-based congestion control that ensures fast convergence to fairness with stability. At the heart of Astraea is a multi-agent deep reinforcement learning framework that explicitly optimizes these convergence properties during the training process by enabling the learning of interactive policy between multiple competing flows, while maintaining high performance. We further build a faithful multi-flow environment that emulates the competing behaviors of concurrent flows, explicitly expressing convergence properties to enable their optimization during training. We have fully implemented Astraea and our comprehensive experiments show that Astraea can quickly converge to fairness point and exhibit better stability than its counterparts. For example, \sys achieves near-optimal bandwidth sharing (i.e., fairness) when multiple flows compete for the same bottleneck, delivers up to 8.4$\times$ faster convergence speed and 2.8$\times$ smaller throughput deviation, while achieving comparable or even better performance over prior solutions.
翻译:近年来,大量基于学习的拥塞控制(CC)方案展现出优于传统TCP协议的性能表现。然而,由于目标函数与收敛特性之间的失配,这些方案未能持续提供良好的收敛特性,包括公平性、快速收敛和稳定性。尽管直觉上可行,但将这些特性融入现有基于学习的拥塞控制面临两大挑战:1)其训练环境仅针对单流性能优化设计,无法实现多流协同优化;2)缺乏可直接量化的指标将这些特性纳入训练目标函数。我们提出Astraea——一种新型基于学习的拥塞控制方案,能够在保证稳定性的同时快速收敛至公平状态。Astraea的核心是多智能体深度强化学习框架,通过使多个竞争流之间学习交互策略,在维持高性能的同时,显式优化训练过程中的收敛特性。我们进一步构建了忠实反映并发流竞争行为的多流环境,通过显式表达收敛特性实现在训练中的优化。我们完整实现了Astraea系统,综合实验表明:Astraea能快速收敛至公平点,并展现出优于同类方案的稳定性。例如,当多流竞争同一瓶颈链路时,该系统实现了接近最优的带宽共享(即公平性),收敛速度提升达8.4倍,吞吐量波动降低2.8倍,同时达到甚至超越现有解决方案的性能水平。