Communication networks able to withstand hostile environments are critically important for disaster relief operations. In this paper, we consider a challenging scenario where drones have been compromised in the supply chain, during their manufacture, and harbour malicious software capable of wide-ranging and infectious disruption. We investigate multi-agent deep reinforcement learning as a tool for learning defensive strategies that maximise communications bandwidth despite continual adversarial interference. Using a public challenge for learning network resilience strategies, we propose a state-of-the-art expert technique and study its superiority over deep reinforcement learning agents. Correspondingly, we identify three specific methods for improving the performance of our learning-based agents: (1) ensuring each observation contains the necessary information, (2) using expert agents to provide a curriculum for learning, and (3) paying close attention to reward. We apply our methods and present a new mixed strategy enabling expert and learning-based agents to work together and improve on all prior results.
翻译:能够抵御恶劣环境的通信网络对于灾难救援行动至关重要。本文考虑了一个极具挑战性的场景:无人机在供应链制造过程中被植入恶意软件,此类软件能引发广泛且具有传染性的网络破坏。我们研究了多智能体深度强化学习作为防御策略学习工具的潜力,旨在持续对抗干扰的情况下最大化通信带宽。依托一项关于网络弹性策略学习的公开挑战,我们提出了一种先进专家技术,并论证了其相较于深度强化学习智能体的优越性。相应地,我们总结出三种提升学习型智能体性能的具体方法:(1)确保每次观测包含必要信息,(2)利用专家智能体提供学习课程,(3)重点关注奖励机制。应用上述方法后,我们提出了一种新型混合策略,使专家与学习型智能体协同工作,最终在所有先前成果基础上实现了性能突破。