PACC: Protocol-Aware Cross-Layer Compression for Compact Network Traffic Representation

Network traffic classification is a core primitive for network security and management, yet it is increasingly challenged by pervasive encryption and evolving protocols. A central bottleneck is representation: hand-crafted flow statistics are efficient but often too lossy, raw-bit encodings can be accurate but are costly, and recent pre-trained embeddings provide transfer but frequently flatten the protocol stack and entangle signals across layers. We observe that real traffic contains substantial redundancy both across network layers and within each layer; existing paradigms do not explicitly identify and remove this redundancy, leading to wasted capacity, shortcut learning, and degraded generalization. To address this, we propose PACC, a redundancy-aware, layer-aware representation framework. PACC treats the protocol stack as multi-view inputs and learns compact layer-wise projections that remain faithful to each layer while explicitly factorizing representations into shared (cross-layer) and private (layer-specific) components. We operationalize these goals with a joint objective that preserves layer-specific information via reconstruction, captures shared structure via contrastive mutual-information learning, and maximizes task-relevant information via supervised losses, yielding compact latents suitable for efficient inference. Across datasets covering encrypted application classification, IoT device identification, and intrusion detection, PACC consistently outperforms feature-engineered and raw-bit baselines. On encrypted subsets, it achieves up to a 12.9% accuracy improvement over nPrint. PACC matches or surpasses strong foundation-model baselines. At the same time, it improves end-to-end efficiency by up to 3.16x.

翻译：网络流量分类是网络安全与管理的核心基础任务，但普遍采用的加密技术和不断演进的协议使其面临日益严峻的挑战。一个核心瓶颈在于表示方法：手工设计的流统计特征虽高效但往往信息损失过大，原始比特编码虽可保持准确但代价高昂，而近期出现的预训练嵌入虽能提供迁移能力，却常常将协议栈扁平化并混淆跨层信号。我们观察到真实流量在网络各层之间以及每层内部均存在大量冗余；现有范式未能显式识别并消除这些冗余，导致容量浪费、捷径学习以及泛化性能下降。为解决此问题，我们提出PACC，一种冗余感知、层级感知的表示框架。PACC将协议栈视为多视图输入，学习紧凑的逐层投影，在忠实保持每层信息的同时，显式地将表示分解为共享（跨层）与私有（层特定）分量。我们通过联合目标函数实现这些目标：通过重建损失保留层特定信息，通过对比互信息学习捕获共享结构，并通过监督损失最大化任务相关信息，从而生成适用于高效推理的紧凑潜在表示。在涵盖加密应用分类、物联网设备识别和入侵检测的数据集上，PACC持续优于特征工程和原始比特基线方法。在加密流量子集上，其准确率较nPrint最高提升12.9%。PACC达到或超越了强基础模型基线的性能。同时，其端到端效率最高提升至3.16倍。