Routing algorithms are crucial for efficient computer network operations, and in many settings they must be able to react to traffic bursts within milliseconds. Live telemetry data can provide informative signals to routing algorithms, and recent work has trained neural networks to exploit such signals for traffic-aware routing. Yet, aggregating network-wide information is subject to communication delays, and existing neural approaches either assume unrealistic delay-free global states, or restrict routers to purely local telemetry. This leaves their deployability in real-world environments unclear. We cast telemetry-aware routing as a delay-aware closed-loop control problem and introduce a framework that trains and evaluates neural routing algorithms, while explicitly modeling communication and inference delays. On top of this framework, we propose LOGGIA, a scalable graph neural routing algorithm that predicts log-space link weights from attributed topology-and-telemetry graphs. It utilizes a data-driven pre-training stage, followed by on-policy Reinforcement Learning. Across synthetic and real network topologies, and unseen mixed TCP/UDP traffic sequences, LOGGIA consistently outperforms shortest-path baselines, whereas neural baselines fail once realistic delays are enforced. Our experiments further suggest that neural routing algorithms like LOGGIA perform best when deployed fully locally, i.e., observing network states and inferring actions at every router individually, as opposed to centralized decision making.
翻译:路由算法对计算机网络的高效运行至关重要,在许多场景中,算法需能在毫秒级内响应流量突发。实时遥测数据可为路由算法提供有效信号,近期研究已训练神经网络利用此类信号实现流量感知路由。然而,全网信息聚合受通信延迟制约,现有神经方法要么假设不现实的零延迟全局状态,要么将路由器限制于纯局部遥测。这使其在真实环境中的部署可行性尚不明确。我们将遥测感知路由建模为延迟感知闭环控制问题,并提出一个显式建模通信与推理延迟的训练评估框架。基于该框架,我们提出LOGGIA——一种可扩展的图神经路由算法,可从属性化拓扑-遥测图中预测对数域链路权重。该方法采用数据驱动预训练阶段,随后进行策略梯度强化学习。在合成与真实网络拓扑及未见混合TCP/UDP流量序列中,LOGGIA始终优于最短路径基线,而神经基线在施加现实延迟后均失效。实验进一步表明,LOGGIA类神经路由算法在完全本地部署时性能最优——即每个路由器独立观测网络状态并推断动作,而非采用集中式决策。