Versatile yet Efficient Network Traffic Analysis: Offloading Network Foundation Model to SmartNIC

Pervasive encryption makes large-scale labeling infeasible for traffic analysis, while security operations demand edge analysis to avert service degradation and further vulnerabilities. These pressures have produced two disjoint research lines: 1) versatile analysis, via network foundation models for low label dependency, and 2) efficient analysis, via hardware offloading for low analysis latency. However, versatility and efficiency have appeared fundamentally incompatible to co-achieve, with prior work consistently sacrificing one for the other, yet we show that this incompatibility is a consequence of polarized design choices across the three components of traffic analysis systems, i.e., traffic processing, model architecture, and analysis execution. In response, we present Nepco, a versatile yet efficient network traffic analysis system that offloads network foundation models to SmartNIC. Our key observation is that discriminative traffic information is concentrated in localized byte regions, motivating versatile yet efficient localized byte-sequence modeling rather than inefficient global modeling. To exploit this without incurring the latency bottlenecks of complex encoding steps, we employ a hardware-friendly processing pipeline that directly embeds raw byte sequences. Crucially, to maintain versatility across diverse tasks, we propose a pattern-aware convolutional architecture equipped with dedicated scoring and gating mechanisms. By exploiting translation invariance, this design dynamically locates and extracts salient semantic signatures. We prototype Nepco on the Nvidia BlueField-3 SmartNIC with multiengine collaborative analysis execution. The experimental results demonstrate that Nepco achieves macro F1 competitive with the best performances achieved by 8 state-of-the-art network foundation models, while reducing end-to-end latency by 328x to the millisecond scale.

翻译：广泛加密导致大规模标注难以适用于流量分析，而安全运维需求边缘分析以避免服务降级和潜在漏洞。这些压力催生出两条互不关联的研究路径：1）多功能分析，通过低标签依赖性的网络基础模型实现；2）高效分析，通过硬件卸载实现低分析延迟。然而，多功能性和高效性看似根本不可兼得——先前的工作始终以牺牲一方为代价换取另一方，但我们证明这种不可调和性源于流量分析系统三大组件（流量处理、模型架构与分析执行）中的极化设计选择。为此，我们提出Nepco——一种兼具多功能与高效性的网络流量分析系统，将网络基础模型卸载至智能网卡。关键发现是：判别性流量信息集中于局部字节区域，这启发我们采用高效且通用的局部字节序列建模，而非低效的全局建模。为在避免复杂编码步骤导致延迟瓶颈的同时利用该特性，我们采用硬件友好的处理流水线，直接嵌入原始字节序列。更重要的是，为保持跨多样任务的多功能性，我们提出一种配备专用评分与门控机制的感知模式卷积架构。通过利用平移不变性，该设计可动态定位并提取显著语义特征。我们在Nvidia BlueField-3智能网卡上实现Nepco原型，采用多引擎协同分析执行。实验结果表明，Nepco的宏F1值与8种最先进网络基础模型的最佳性能相当，同时将端到端延迟降低328倍至毫秒级。