Robust and Scalable Routing with Multi-Agent Deep Reinforcement Learning for MANETs

Highly dynamic mobile ad-hoc networks (MANETs) are continuing to serve as one of the most challenging environments to develop and deploy robust, efficient, and scalable routing protocols. In this paper, we present DeepCQ+ routing which, in a novel manner, integrates emerging multi-agent deep reinforcement learning (MADRL) techniques into existing Q-learning-based routing protocols and their variants, and achieves persistently higher performance across a wide range of MANET configurations while training only on a limited range of network parameters and conditions. Quantitatively, DeepCQ+ shows consistently higher end-to-end throughput with lower overhead compared to its Q-learning-based counterparts with the overall gain of 10-15% in its efficiency. Qualitatively and more significantly, DeepCQ+ maintains remarkably similar performance gains under many scenarios that it was not trained for in terms of network sizes, mobility conditions, and traffic dynamics. To the best of our knowledge, this is the first successful demonstration of MADRL for the MANET routing problem that achieves and maintains a high degree of scalability and robustness even in the environments that are outside the trained range of scenarios. This implies that the proposed hybrid design approach of DeepCQ+ that combines MADRL and Q-learning significantly increases its practicality and explainability because the real-world MANET environment will likely vary outside the trained range of MANET scenarios.

翻译：高度动态的移动性特设网络(MANETs)继续作为最具挑战性的环境之一,以开发和部署强大、高效和可扩缩的路线规程。在本文件中,我们展示了DeepC ⁇ 路由,以新颖的方式将新兴的多剂深度强化学习(MADRL)技术纳入现有的基于学习的路线规程及其变体,并在广泛的MANET配置中取得持续的更高绩效,同时仅就有限的网络参数和条件进行培训。从数量上看,DeepC ⁇ 显示,与基于Q学习的对应方相比,终端对终端对终端的吞吐量一直较高,其管理管理量低于基于Q-学习的对应方,其效率总体收益为10-15%。从质量上看,更重要的是,DeepC ⁇ 在许多没有在网络规模、流动性条件和交通动态方面受过培训的情景下,保持了非常相似的业绩增益。据我们所知,这是MADRL首次成功展示了实现并保持高水平水平和高水平的顶端端管理,其基于QMADR环境中经过培训的深度设计范围,从而隐含了对外部设计环境进行深度的深度设计的深度增长的可能性。