Traffic Engineering (TE) is critical for improving network performance and reliability. A key challenge in TE is the management of sudden traffic bursts. Existing TE schemes either do not handle traffic bursts or uniformly guard against traffic bursts, thereby facing difficulties in achieving a balance between normal-case performance and burst-case performance. To address this issue, we introduce FIGRET, a Fine-Grained Robustness-Enhanced TE scheme. FIGRET offers a novel approach to TE by providing varying levels of robustness enhancements, customized according to the distinct traffic characteristics of various source-destination pairs. By leveraging a burst-aware loss function and deep learning techniques, FIGRET is capable of generating high-quality TE solutions efficiently. Our evaluations of real-world production networks, including Wide Area Networks and data centers, demonstrate that FIGRET significantly outperforms existing TE schemes. Compared to the TE scheme currently deployed in Jupiter data center networks of Google, FIGRET achieves a 9\%-34\% reduction in average Maximum Link Utilization and improves solution speed by $35\times$-$1800 \times$. Against DOTE, a state-of-the-art deep learning-based TE method, FIGRET substantially lowers the occurrence of significant congestion events triggered by traffic bursts by 41\%-53.9\% in topologies with high traffic dynamics.
翻译:流量工程(TE)对于提升网络性能与可靠性至关重要。TE中的一个核心挑战在于突发流量的管理。现有TE方案要么无法处理流量突发,要么采用统一的防护策略,因而难以在常态性能与突发性能之间取得平衡。为解决这一问题,我们提出了FIGRET——一种细粒度鲁棒性增强的TE方案。FIGRET通过提供可定制化的多级鲁棒性增强机制,为TE提供了创新思路,该机制能够根据不同源-目的节点对的差异化流量特征进行适配。通过采用突发感知的损失函数与深度学习技术,FIGRET能够高效生成高质量的TE解。我们在包括广域网与数据中心在内的真实生产网络中的评估表明,FIGRET显著优于现有TE方案。与当前部署于Google Jupiter数据中心网络的TE方案相比,FIGRET将平均最大链路利用率降低了9\%–34\%,并将求解速度提升了$35\times$–$1800 \times$。相较于基于深度学习的先进TE方法DOTE,在高动态流量拓扑中,FIGRET将流量突发引发的严重拥塞事件发生率大幅降低了41\%–53.9\%。