FERN: Leveraging Graph Attention Networks for Failure Evaluation and Robust Network Design

Robust network design, which aims to guarantee network availability under various failure scenarios while optimizing performance/cost objectives, has received significant attention. Existing approaches often rely on model-based mixed-integer optimization that is hard to scale or employ deep learning to solve specific engineering problems yet with limited generalizability. In this paper, we show that failure evaluation provides a common kernel to improve the tractability and scalability of existing solutions. By providing a neural network function approximation of this common kernel using graph attention networks, we develop a unified learning-based framework, FERN, for scalable Failure Evaluation and Robust Network design. FERN represents rich problem inputs as a graph and captures both local and global views by attentively performing feature extraction from the graph. It enables a broad range of robust network design problems, including robust network validation, network upgrade optimization, and fault-tolerant traffic engineering that are discussed in this paper, to be recasted with respect to the common kernel and thus computed efficiently using neural networks and over a small set of critical failure scenarios. Extensive experiments on real-world network topologies show that FERN can efficiently and accurately identify key failure scenarios for both OSPF and optimal routing scheme, and generalizes well to different topologies and input traffic patterns. It can speed up multiple robust network design problems by more than 80x, 200x, 10x, respectively with negligible performance gap.

翻译：鲁棒网络设计旨在保障网络在各类故障场景下的可用性，同时优化性能/成本目标，该问题已受到广泛关注。现有方法通常依赖基于模型难以扩展的混合整数优化，或采用深度学习解决特定工程问题但泛化能力有限。本文表明，故障评估可作为提升现有方案可解性与可扩展性的通用内核。通过利用图注意力网络对该通用内核进行神经网络函数逼近，我们提出统一的学习框架FERN，用于可扩展的故障评估与鲁棒网络设计。FERN将丰富的问题输入表示为图，并通过从图中进行注意力特征提取来捕获局部与全局视图。它使得本文讨论的多种鲁棒网络设计问题（包括鲁棒网络验证、网络升级优化、容错流量工程）能够基于该通用内核重新表述，从而利用神经网络在少量关键故障场景下高效求解。在真实网络拓扑上的大量实验表明，FERN能够高效准确地识别OSPF及最优路由方案中的关键故障场景，并对不同拓扑与输入流量模式具有良好的泛化能力。在性能损失可忽略的情况下，它可将多个鲁棒网络设计问题的求解速度分别提升80倍、200倍、10倍以上。

相关内容

Networking

关注 23

Networking：IFIP International Conferences on Networking。 Explanation：国际网络会议。 Publisher：IFIP。 SIT： http://dblp.uni-trier.de/db/conf/networking/index.html