Fast and efficient transport protocols are the foundation of an increasingly distributed world. The burden of continuously delivering improved communication performance to support next-generation applications and services, combined with the increasing heterogeneity of systems and network technologies, has promoted the design of Congestion Control (CC) algorithms that perform well under specific environments. The challenge of designing a generic CC algorithm that can adapt to a broad range of scenarios is still an open research question. To tackle this challenge, we propose to apply a novel Reinforcement Learning (RL) approach. Our solution, MARLIN, uses the Soft Actor-Critic algorithm to maximize both entropy and return and models the learning process as an infinite-horizon task. We trained MARLIN on a real network with varying background traffic patterns to overcome the sim-to-real mismatch that researchers have encountered when applying RL to CC. We evaluated our solution on the task of file transfer and compared it to TCP Cubic. While further research is required, results have shown that MARLIN can achieve comparable results to TCP with little hyperparameter tuning, in a task significantly different from its training setting. Therefore, we believe that our work represents a promising first step toward building CC algorithms based on the maximum entropy RL framework.
翻译:高效快速的传输协议是日益分布式世界的基础。为支撑下一代应用与服务而持续提升通信性能的需求,加之系统与网络技术日益增长的异构性,推动了针对特定环境设计拥塞控制算法的研究。设计一种能适应广泛场景的通用拥塞控制算法仍是开放性问题。为应对这一挑战,我们提出了一种新型强化学习方法。我们的解决方案MARLIN采用软演员-评论家算法以最大化熵与回报,并将学习过程建模为无限时域任务。我们在具有动态背景流量的真实网络上训练MARLIN,以克服研究人员将强化学习应用于拥塞控制时遇到的仿真与现实不匹配问题。我们在文件传输任务上评估了该方案,并与TCP Cubic进行对比。尽管仍需进一步研究,但结果表明,在与其训练场景显著不同的任务中,MARLIN仅需少量超参数调优即可达到与TCP相当的性能。因此,我们认为这项工作是基于最大熵强化学习框架构建拥塞控制算法的可行初步探索。