Modern communication networks feature local fast failover mechanisms in the data plane, to swiftly respond to link failures with pre-installed rerouting rules. This paper explores resilient routing meant to tolerate $\leq k$ simultaneous link failures, ensuring packet delivery, provided that the source and destination remain connected. While past theoretical works studied failover routing under static link failures, i.e., links which permanently and simultaneously fail, real-world networks often face link flapping--dynamic down states caused by, e.g., numerous short-lived software-related faults. Thus, in this initial work, we re-investigate the resilience of failover routing against link flapping, by categorizing link failures into static, semi-dynamic (removing the assumption that links fail simultaneously), and dynamic (removing the assumption that links fail permanently) types, shedding light on the capabilities and limitations of failover routing under these scenarios. We show that $k$-edge-connected graphs exhibit $(k-1)$-resilient routing against dynamic failures for $k \leq 5$. We further show that this result extends to arbitrary $k$ if it is possible to rewrite $\log k$ bits in the packet header. Rewriting $3$ bits suffices to cope with $k$ semi-dynamic failures. However, on general graphs, tolerating $2$ dynamic failures becomes impossible without bit-rewriting. Even by rewriting $\log k$ bits, resilient routing cannot resolve $k$ dynamic failures, demonstrating the limitation of local fast rerouting.
翻译:现代通信网络在数据平面采用本地快速故障切换机制,通过预安装的重路由规则对链路故障做出快速响应。本文研究弹性路由设计,旨在容忍≤k条链路同时故障,并确保在源节点与目的节点保持连通的前提下实现数据包交付。以往理论研究主要关注静态链路故障(即链路永久且同时失效)下的故障切换路由,然而实际网络常面临链路抖动——由大量短暂性软件相关故障等引起的动态中断状态。因此,在本项开创性工作中,我们通过将链路故障分类为静态、半动态(移除链路同时故障假设)和动态(移除链路永久故障假设)三种类型,重新审视故障切换路由对链路抖动的弹性,揭示这些场景下故障切换路由的能力与局限。我们证明对于k≤5的情况,k边连通图能实现针对动态故障的(k-1)弹性路由。进一步研究表明,若能在数据包头部重写log k比特,该结论可推广至任意k值。仅需重写3比特即可应对k条半动态故障。然而在一般图上,若不进行比特重写则无法容忍2条动态故障。即使重写log k比特,弹性路由仍无法解决k条动态故障,这揭示了本地快速重路由机制的固有局限性。