Designing effective routing strategies for mobile wireless networks is challenging due to the need to seamlessly adapt routing behavior to spatially diverse and temporally changing network conditions. In this work, we use deep reinforcement learning (DeepRL) to learn a scalable and generalizable single-copy routing strategy for such networks. We make the following contributions: i) we design a reward function that enables the DeepRL agent to explicitly trade-off competing network goals, such as minimizing delay vs. the number of transmissions per packet; ii) we propose a novel set of relational neighborhood, path, and context features to characterize mobile wireless networks and model device mobility independently of a specific network topology; and iii) we use a flexible training approach that allows us to combine data from all packets and devices into a single offline centralized training set to train a single DeepRL agent. To evaluate generalizeability and scalability, we train our DeepRL agent on one mobile network scenario and then test it on other mobile scenarios, varying the number of devices and transmission ranges. Our results show our learned single-copy routing strategy outperforms all other strategies in terms of delay except for the optimal strategy, even on scenarios on which the DeepRL agent was not trained.
翻译:设计移动无线网络的有效路由策略极具挑战性,因为需要将路由行为无缝适配到空间多样且时变变化的网络条件中。在这项工作中,我们使用深度强化学习来学习一种可扩展且可泛化的单副本路由策略,用于此类网络。我们做出以下贡献:i) 设计了一个奖励函数,使深度强化学习代理能够显式权衡相互竞争的网络目标,例如最小化延迟与每包传输次数;ii) 提出了一组新颖的关系型邻域、路径和上下文特征,用于表征移动无线网络,并独立于特定网络拓扑对设备移动性进行建模;iii) 采用灵活的训练方法,可将所有数据包和设备的数据合并到单个离线集中训练集中,以训练单个深度强化学习代理。为了评估可泛化性和可扩展性,我们在一个移动网络场景中训练深度强化学习代理,然后在其他移动场景中进行测试(改变设备数量和传输范围)。结果表明,我们学到的单副本路由策略在延迟方面优于除最优策略外的所有其他策略,即使对深度强化学习代理未经训练的场景也是如此。