Network traffic analysis increasingly uses complex machine learning models as the internet consolidates and traffic gets more encrypted. However, over high-bandwidth networks, flows can easily arrive faster than model inference rates. The temporal nature of network flows limits simple scale-out approaches leveraged in other high-traffic machine learning applications. Accordingly, this paper presents ServeFlow, a solution for machine-learning model serving aimed at network traffic analysis tasks, which carefully selects the number of packets to collect and the models to apply for individual flows to achieve a balance between minimal latency, high service rate, and high accuracy. We identify that on the same task, inference time across models can differ by 2.7x-136.3x, while the median inter-packet waiting time is often 6-8 orders of magnitude higher than the inference time! ServeFlow is able to make inferences on 76.3% flows in under 16ms, which is a speed-up of 40.5x on the median end-to-end serving latency while increasing the service rate and maintaining similar accuracy. Even with thousands of features per flow, it achieves a service rate of over 48.5k new flows per second on a 16-core CPU commodity server, which matches the order of magnitude of flow rates observed on city-level network backbones.
翻译:网络流量分析日益依赖复杂机器学习模型,随着互联网的整合与流量的加密化,模型推理速率往往难以匹配高带宽网络中的流到达速率。网络流的时间特性限制了其他高流量机器学习应用中常用的简单扩展方法。为此,本文提出ServeFlow——一种面向网络流量分析任务的机器学习模型服务解决方案,通过精心选择各流的数据包采集数量及适用模型,在最小延迟、高服务速率与高精度之间取得平衡。我们发现,在同一任务中,不同模型的推理时间差异可达2.7倍至136.3倍,而数据包间等待时间的中位数通常比推理时间高6至8个数量级!ServeFlow能在16毫秒内完成76.3%流的推理,使端到端服务延迟中位数加速40.5倍,同时提升服务速率并保持相近精度。即便每个流包含数千个特征,在16核CPU通用服务器上仍可实现每秒超过4.85万新流的服务速率,达到城市级网络骨干网流速量级。