Host CPU resources are heavily consumed by TCP stack processing, limiting scalability in data centers. Existing offload methods typically address only partial functionality or lack flexibility. This paper introduces PnO (Plug & Offload), an approach to fully offload TCP processing transparently onto off-path SmartNICs (NVIDIA BlueField DPUs). Key to our solution is PnO-TCP, a novel TCP stack specifically designed for efficient execution on the DPU's general-purpose cores, panning both the host and the SmartNIC to facilitate the offload. PnO-TCP leverages a lightweight, user-space stack based on DPDK, achieving high performance despite the relatively modest computational power of off-path SmartNIC cores. Our evaluation, using real-world applications (Redis, Lighttpd, and HAProxy), demonstrates that PnO achieves transparent TCP stack offloading, leading to both substantial reductions in host CPU usage and, in many cases, significant performance improvements, particularly for small packet scenarios (< 2KB) where RPS gains of 34%-127% were observed in single-threaded tests. Our evaluation, using real-world applications (Redis, Lighttpd, and HAProxy), demonstrates that PnO achieves transparent TCP stack offloading, leading to both substantial reductions in host CPU usage and, in many cases, significant performance improvements, particularly for small packet scenarios (< 2KB) where RPS gains of 34%-127% were observed in single-threaded tests.
翻译:TCP协议栈处理大量消耗主机CPU资源,限制了数据中心的扩展性。现有卸载方法通常仅解决部分功能或缺乏灵活性。本文提出PnO(即插即卸载)方法,可将TCP处理完全透明地卸载至旁路智能网卡(NVIDIA BlueField DPU)。本方案的核心是PnO-TCP——一种专为在DPU通用核心上高效执行而设计的新型TCP协议栈,通过统筹主机与智能网卡资源实现卸载。PnO-TCP采用基于DPDK的轻量级用户态协议栈,即使在旁路智能网卡核心计算能力相对有限的情况下仍能实现高性能。我们使用实际应用(Redis、Lighttpd和HAProxy)进行的评估表明,PnO实现了透明的TCP协议栈卸载,不仅显著降低了主机CPU使用率,而且在多数情况下带来明显的性能提升,尤其在小数据包场景(<2KB)中,单线程测试观测到RPS提升达34%-127%。