The emergence of programmable switches has brought in-network computing (INC) into the spotlight in recent years. By offloading computation directly onto the data transmission process, INC improves network utilization, reduces latency to sub-RTT levels, saves link bandwidth, and maintains throughput. However, INC disrupts the transparency of traditional networks, forcing developers to consider network exceptions like packet loss and out-of-order. If not properly handled, these exceptions can lead to violations of application properties, such as cache consistency and lock exclusion. Usual testing cannot exhaustively cover these exceptions, raising doubts about the correctness of INC systems and hindering their deployment in the industry. This paper presents INCGuard, the first general-purpose tool for verifying INC systems. INCGuard provides a high-level specification language and saves developers 67.2% lines of code on average. To help better understand the behavior of the system, INCGuard offers configurable network environments. INCGuard enables developers to express INC-specific correctness properties. INCGuard translates developer-specified systems into state transition representations, performs model checking to detect potential design risks, and reports violation traces to developers. We propose optimizations for INC-specific scenarios to address the challenge of state space explosion. We modeled seven INC systems and identified their risks with INCGuard in seconds. We further reproduce them in real systems to confirm the validity of our verification result.
翻译:可编程交换机的出现使得网内计算(INC)近年来备受关注。通过将计算任务直接卸载至数据传输过程,INC 能够提升网络利用率、将延迟降至亚RTT级别、节省链路带宽并保持高吞吐量。然而,INC 打破了传统网络的透明性,迫使开发者必须应对数据包丢失与乱序等网络异常。若未能妥善处理,这些异常可能导致缓存一致性与锁互斥等应用属性的违规。常规测试无法穷尽覆盖此类异常,从而引发对INC系统正确性的质疑,并阻碍其在工业界的部署。本文提出 INCGuard——首个面向INC系统的通用验证工具。INCGuard 提供了高层规约语言,平均为开发者节省67.2%的代码量。为帮助开发者更深入理解系统行为,INCGuard 提供了可配置的网络环境。INCGuard 支持开发者表达INC特有的正确性属性,将开发者指定的系统转化为状态转换表示,通过模型检测发现潜在设计风险,并向开发者报告违规轨迹。针对INC特有场景,我们提出了优化方案以应对状态空间爆炸问题。我们对七个INC系统进行建模,并在数秒内通过INCGuard识别其风险。为验证结果的可靠性,我们在真实系统中复现了这些风险。