Critical networking workflows require high-fidelity packet captures (PCAPs) for testing, security analysis, and protocol validation, not just statistical flow-level summaries. Recent packet generators have demonstrated protocol-constrained PCAP synthesis, but they universally decode directly to raw packet fields. That interface entangles learned behavioral choices with deterministic protocol consequences, which forces packet realization to depend on post-hoc heuristic repair. We identify this decode interface as the fundamental bottleneck and present TraceCodec, a state-aware neural codec for stateful multi-flow traces. TraceCodec lifts each packet into a timed packet action with explicit flow slots and transport cues, then learns a continuous per-packet latent. A deterministic compiler lowers decoded actions back to PCAPs, owning endpoint assignment, TCP state, legality constraints, and packet rendering. The latent layer exposes a generator-facing sequence space, so downstream traffic models can operate on packet-action latents rather than raw header fields. On CICIDS2017 Monday, TraceCodec matches packet count, protocol composition, and flow population to within 0.03%. Raw-field baselines under the same non-repair policy distort flow counts and TCP state by orders of magnitude. Structural diagnostics show that TraceCodec preserves TCP state transitions and multi-flow interleaving that raw-field decoders fragment. This work establishes a new foundation for high-fidelity packet-trace generation.
翻译:关键网络工作流需要高保真数据包捕获(PCAP)以进行测试、安全分析和协议验证,而不仅仅是统计级别的流摘要。近期数据包生成器已展示了协议约束下的PCAP合成能力,但它们普遍直接解码为原始数据包字段。这种接口将学习到的行为选择与确定性的协议后果纠缠在一起,迫使数据包实现依赖于事后启发式修复。我们认为该解码接口是根本瓶颈,并提出了TraceCodec,一种面向有状态多流轨迹的状态感知神经编解码器。TraceCodec将每个数据包提升为带有显式流槽和传输线索的定时数据包动作,随后学习连续的逐数据包潜在表示。确定性编译器将解码后的动作降维回PCAP,负责端点分配、TCP状态、合法性约束及数据包渲染。潜在层暴露了面向生成器的序列空间,因此下游流量模型可以操作数据包动作的潜在表示而非原始头字段。在CICIDS2017 Monday数据集上,TraceCodec将数据包数量、协议组成和流数量匹配到0.03%以内的误差。在相同非修复策略下,基于原始字段的基线方法使流数量和TCP状态失真多个数量级。结构诊断表明,TraceCodec保留了原始字段解码器会割裂的TCP状态转换和多流交织。这项工作为高保真数据包轨迹生成奠定了新基础。