Conventional Congestion Control (CC) algorithms,such as TCP Cubic, struggle in tactical environments as they misinterpret packet loss and fluctuating network performance as congestion symptoms. Recent efforts, including our own MARLIN, have explored the use of Reinforcement Learning (RL) for CC, but they often fall short of generalization, particularly in competitive, unstable, and unforeseen scenarios. To address these challenges, this paper proposes an RL framework that leverages an accurate and parallelizable emulation environment to reenact the conditions of a tactical network. We also introduce refined RL formulation and performance evaluation methods tailored for agents operating in such intricate scenarios. We evaluate our RL learning framework by training a MARLIN agent in conditions replicating a bottleneck link transition between a Satellite Communication (SATCOM) and an UHF Wide Band (UHF) radio link. Finally, we compared its performance in file transfer tasks against Transmission Control Protocol (TCP) Cubic and the default strategy implemented in the Mockets tactical communication middleware. The results demonstrate that the MARLIN RL agent outperforms both TCP and Mockets under different perspectives and highlight the effectiveness of specialized RL solutions in optimizing CC for tactical network environments.
翻译:传统拥塞控制(CC)算法(如TCP Cubic)在战术环境中难以有效运作,因为它们将数据包丢失和网络性能波动误判为拥塞症状。近期的研究(包括我们自己的MARLIN)探索了将强化学习(RL)用于拥塞控制,但这些方法通常在泛化能力上存在不足,尤其是在竞争性、不稳定性及不可预见的场景中。为解决这些挑战,本文提出了一种强化学习框架,该框架利用精确且可并行化的仿真环境来重现战术网络的条件。我们还针对在此类复杂场景中运行的智能体,引入了改进的强化学习公式和性能评估方法。我们通过训练MARLIN智能体来评估该强化学习框架,训练条件模拟了卫星通信(SATCOM)与超高频宽带(UHF)无线链路之间的瓶颈链路切换场景。最后,我们在文件传输任务中将其性能与TCP Cubic以及Mockets战术通信中间件中实现的默认策略进行了比较。结果表明,MARLIN强化学习智能体在不同视角下均优于TCP和Mockets,凸显了专用强化学习解决方案在优化战术网络环境拥塞控制方面的有效性。