We present CrossCheck, a system that validates inputs to the Software-Defined Networking (SDN) controller in a Wide Area Network (WAN). By detecting incorrect inputs - often stemming from bugs in the SDN control infrastructure - CrossCheck alerts operators before they trigger network outages. Our analysis at a large-scale WAN operator identifies invalid inputs as a leading cause of major outages, and we show how CrossCheck would have prevented those incidents. We deployed CrossCheck as a shadow validation system for four weeks in a production WAN, during which it accurately detected the single incident of invalid inputs that occurred while sustaining a 0% false positive rate under normal operation, hence imposing little additional burden on operators. In addition, we show through simulation that CrossCheck reliably detects a wide range of invalid inputs (e.g., detecting demand perturbations as small as 5% with 100% accuracy) and maintains a near-zero false positive rate for realistic levels of noisy, missing, or buggy telemetry data (e.g., sustaining zero false positives with up to 30% of corrupted telemetry data).
翻译:本文提出CrossCheck系统,用于验证广域网(WAN)中软件定义网络(SDN)控制器的输入。该系统通过检测错误输入(通常源于SDN控制基础设施中的缺陷),在触发网络中断前向运维人员发出警报。我们在某大型广域网运营商处的分析表明,无效输入是导致重大中断的主要原因,并论证了CrossCheck如何能够预防这些事故。我们将CrossCheck作为影子验证系统在生产广域网中部署了四周,期间准确检测到唯一发生的无效输入事件,在正常运行时保持0%的误报率,因此对运维人员造成的额外负担极小。此外,通过仿真实验表明,CrossCheck能可靠检测各类无效输入(例如以100%准确率检测小至5%的需求扰动),并在现实水平的噪声、缺失或故障遥测数据下维持接近零的误报率(例如在高达30%的遥测数据损坏情况下仍保持零误报)。