Software network functions (NFs) trade-off flexibility and ease of deployment for an increased challenge of performance. The traditional way to increase NF performance is by distributing traffic to multiple CPU cores, but this poses a significant challenge: how to parallelize an NF without breaking its semantics? We propose Maestro, a tool that analyzes a sequential implementation of an NF and automatically generates an enhanced parallel version that carefully configures the NIC's Receive Side Scaling mechanism to distribute traffic across cores, while preserving semantics. When possible, Maestro orchestrates a shared-nothing architecture, with each core operating independently without shared memory coordination, maximizing performance. Otherwise, Maestro choreographs a fine-grained read-write locking mechanism that optimizes operation for typical Internet traffic. We parallelized 8 software NFs and show that they generally scale-up linearly until bottlenecked by PCIe when using small packets or by 100Gbps line-rate with typical Internet traffic. Maestro further outperforms modern hardware-based transactional memory mechanisms, even for challenging parallel-unfriendly workloads.
翻译:软件网络功能(NFs)以灵活性及部署便捷性为代价,带来了性能提升的挑战。传统上,通过将流量分配到多个CPU核心来提高NF性能,但这面临一个关键问题:如何在保持语义不变的前提下实现NF的并行化?我们提出Maestro工具,它分析NF的顺序实现,自动生成增强的并行版本,通过精心配置网卡的接收端缩放机制(RSS)将流量分配到不同核心,同时保留原始语义。在可行时,Maestro构建无共享架构,使各核心独立运行而无需共享内存协调,从而最大化性能;否则,Maestro编排一种细粒度读写锁机制,针对典型互联网流量优化操作。我们对8个软件NF进行了并行化测试,结果显示:当使用小数据包时,其性能通常线性扩展至PCIe瓶颈;在典型互联网流量下,则可达到100Gbps线速。此外,即使对于不利于并行的负载,Maestro的性能仍优于现代基于硬件的内存事务机制。