Access to raw network traffic data is essential for many computer networking tasks, from traffic modeling to performance evaluation. Unfortunately, this data is scarce due to high collection costs and governance rules. Previous efforts explore this challenge by generating synthetic network data, but fail to reliably handle multi-flow sessions, struggle to reason about stateful communication in moderate to long-duration network sessions, and lack robust evaluations tied to real-world utility. We propose a new method based on state-space models called NetSSM that generates raw network traffic at the packet-level granularity. Our approach captures interactions between multiple, interleaved flows -- an objective unexplored in prior work -- and effectively reasons about flow-state in sessions to capture traffic characteristics. NetSSM accomplishes this by learning from and producing traces 8x and 78x longer than existing transformer-based approaches. Evaluation results show that our method generates high-fidelity traces that outperform prior efforts in existing benchmarks. We also find that NetSSM's traces have high semantic similarity to real network data regarding compliance with standard protocol requirements and flow and session-level traffic characteristics.
翻译:获取原始网络流量数据对于从流量建模到性能评估的众多计算机网络任务至关重要。然而,由于高昂的采集成本和治理规则,此类数据极为稀缺。先前的研究尝试通过生成合成网络数据来应对这一挑战,但未能可靠地处理多流会话,难以对中长时网络会话中的有状态通信进行推理,并且缺乏与现实应用效用紧密关联的稳健评估。我们提出了一种基于状态空间模型的新方法NetSSM,该方法可在数据包粒度上生成原始网络流量。我们的方法能够捕获多个交错流之间的交互——这是先前工作中未曾探索的目标——并有效地对会话中的流状态进行推理,以捕捉流量特征。NetSSM通过学习并生成比现有基于Transformer的方法长8倍和78倍的轨迹来实现这一目标。评估结果表明,我们的方法生成了高保真轨迹,在现有基准测试中超越了先前的工作。我们还发现,NetSSM生成的轨迹在符合标准协议要求以及流与会话级流量特征方面,与真实网络数据具有高度的语义相似性。