Software network functions (NFs) trade-off flexibility and ease of deployment for an increased challenge of performance. The traditional way to increase NF performance is by distributing traffic to multiple CPU cores, but this poses a significant challenge: how to parallelize an NF without breaking its semantics? We propose Maestro, a tool that analyzes a sequential implementation of an NF and automatically generates an enhanced parallel version that carefully configures the NIC's Receive Side Scaling mechanism to distribute traffic across cores, while preserving semantics. When possible, Maestro orchestrates a shared-nothing architecture, with each core operating independently without shared memory coordination, maximizing performance. Otherwise, Maestro choreographs a fine-grained read-write locking mechanism that optimizes operation for typical Internet traffic. We parallelized 8 software NFs and show that they generally scale-up linearly until bottlenecked by PCIe when using small packets or by 100Gbps line-rate with typical Internet traffic. Maestro further outperforms modern hardware-based transactional memory mechanisms, even for challenging parallel-unfriendly workloads.
翻译:软件网络功能(NFs)在灵活性与部署便捷性之间权衡,却面临性能提升的更大挑战。传统上提升NF性能的方法是将流量分布到多个CPU核心,但这带来了重大难题:如何在保持语义不变的前提下实现NF的并行化?我们提出Maestro工具,它分析NF的顺序实现并自动生成增强的并行版本,通过精心配置网卡接收侧缩放机制来跨核心分配流量,同时保留原有语义。在可行的情况下,Maestro构建无共享架构,让每个核心独立运行而无需共享内存协调,从而实现性能最大化;否则,Maestro设计一种细粒度读写锁机制,针对典型互联网流量优化操作。我们对8个软件NF进行并行化处理,结果表明其性能通常呈线性扩展,直至小数据包场景下受限于PCIe带宽,或在典型互联网流量下达到100Gbps线速瓶颈。即使面对不利于并行的挑战性工作负载,Maestro的性能仍优于基于硬件的现代事务内存机制。