Access to raw network traffic data is essential for many computer networking tasks, from traffic modeling to performance evaluation. Unfortunately, this data is scarce due to high collection costs and governance rules. Previous efforts explore this challenge by generating synthetic network data, but fail to reliably handle multi-flow sessions, struggle to reason about stateful communication in moderate to long-duration network sessions, and lack robust evaluations tied to real-world utility. We propose a new method based on state-space models called NetSSM that generates raw network traffic at the packet-level granularity. Our approach captures interactions between multiple, interleaved flows -- an objective unexplored in prior work -- and effectively reasons about flow-state in sessions to capture traffic characteristics. NetSSM accomplishes this by learning from and producing traces 8x and 78x longer than existing transformer-based approaches. Evaluation results show that our method generates high-fidelity traces that outperform prior efforts in existing benchmarks. We also find that NetSSM's traces have high semantic similarity to real network data regarding compliance with standard protocol requirements and flow and session-level traffic characteristics.
翻译:获取原始网络流量数据对于众多计算机网络任务至关重要,涵盖从流量建模到性能评估的各个方面。然而,由于高昂的采集成本与治理规则限制,此类数据极为稀缺。先前的研究尝试通过生成合成网络数据来应对这一挑战,但未能可靠处理多流会话,难以对中长时网络会话中的有状态通信进行有效推理,且缺乏与现实应用紧密关联的稳健评估。本文提出一种基于状态空间模型的新方法NetSSM,可在数据包粒度生成原始网络流量。我们的方法能够捕捉多个交错流之间的相互作用——这是先前工作中未曾探索的目标——并有效推理会话中的流状态以捕获流量特征。NetSSM通过学习并生成比现有基于Transformer的方法长8倍和78倍的轨迹来实现这一目标。评估结果表明,本方法生成的高保真轨迹在现有基准测试中优于先前工作。我们还发现,NetSSM生成的轨迹在符合标准协议要求以及流与会话级流量特征方面,与真实网络数据具有高度的语义相似性。