Network function (NF) offloading on SmartNICs has been widely used in modern data centers, offering benefits in host resource saving and programmability. Co-running NFs on the same SmartNICs can cause performance interference due to onboard resource contention. Therefore, to meet performance SLAs while ensuring efficient resource management, operators need mechanisms to predict NF performance under such contention. However, existing solutions lack SmartNIC-specific knowledge and exhibit limited traffic awareness, leading to poor accuracy for on-NIC NFs. This paper proposes Tomur, a novel performance predictive system for on-NIC NFs. Tomur builds upon the key observation that co-located NFs contend for multiple resources, including onboard accelerators and the memory subsystem. It also facilitates traffic awareness according to the behaviors of individual resources to maintain accuracy as the external traffic attributes vary. Evaluation using BlueField-2 SmartNIC shows that Tomur improves the prediction accuracy by 78.8% and reduces SLA violations by 92.2% compared to state-of-the-art approaches, and enables new practical usecases.
翻译:在现代数据中心中,智能网卡上的网络功能卸载已得到广泛应用,其在节省主机资源与提升可编程性方面具有显著优势。然而,在同一智能网卡上共址运行多个网络功能会因板载资源争用导致性能干扰。因此,为在确保高效资源管理的同时满足性能服务等级协议,运营商需要能够预测此类争用条件下网络功能性能的机制。现有解决方案缺乏对智能网卡特性的专门认知,且流量感知能力有限,导致对网卡内网络功能的预测精度不足。本文提出Tomur——一种面向网卡内网络功能的新型性能预测系统。Tomur基于关键观测构建:共址网络功能会争用包括板载加速器与内存子系统在内的多种资源。该系统还能根据各资源的行为特征实现流量感知,从而在外界流量属性变化时保持预测准确性。基于BlueField-2智能网卡的评估表明,与现有先进方法相比,Tomur将预测准确率提升了78.8%,并将服务等级协议违约率降低了92.2%,同时实现了新的实际应用场景。