The fidelity and utility of synthetic network traffic are critically compromised by architectural mismatch across heterogeneous network datasets and prevalent scalability failure. This study addresses this challenge by establishing an Architectural Selection Framework that empirically quantifies how data structure compatibility dictates the optimal fidelity-utility trade-off. We systematically evaluate twelve generative architectures (both non-AI and AI) across two distinct data structure types: categorical-heavy NSL-KDD and continuous-flow-heavy CIC-IDS2017. Fidelity is rigorously assessed through three structural metrics (Data Structure, Correlation, and Probability Distribution Difference) to confirm structural realism before evaluating downstream utility. Our results, confirmed over twenty independent runs (N=20), demonstrate that GAN-based models (CTGAN, CopulaGAN) exhibit superior architectural robustness, consistently achieving the optimal balance of statistical fidelity and practical utility. Conversely, the framework exposes critical failure modes, i.e., statistical methods compromise structural fidelity for utility (Compromised fidelity), and modern iterative architectures, such as Diffusion Models, face prohibitive computational barriers, rendering them impractical for large-scale security deployment. This contribution provides security practitioners with an evidence-based guide for mitigating architectural failures, thereby setting a benchmark for reliable and scalable synthetic data deployment in adaptive security solutions.
翻译:合成网络流量的保真度与实用性因异构网络数据集间的架构失配及普遍存在的可扩展性失效而受到严重影响。本研究通过建立架构选择框架应对这一挑战,该框架实证量化了数据结构兼容性如何决定最优的保真度-实用性权衡。我们系统评估了十二种生成架构(包括非AI与AI方法)在两种不同数据结构类型上的表现:分类特征密集的NSL-KDD数据集与连续流特征密集的CIC-IDS2017数据集。保真度通过三个结构指标(数据结构差异、相关性差异和概率分布差异)进行严格评估,以确保在评估下游实用性前确认结构真实性。经二十次独立实验验证的结果(N=20)表明,基于GAN的模型(CTGAN、CopulaGAN)展现出卓越的架构鲁棒性,能持续实现统计保真度与实际实用性的最优平衡。相反,该框架揭示了关键失效模式:统计方法为实用性牺牲结构保真度(妥协的保真度),而现代迭代架构(如扩散模型)面临极高的计算壁垒,使其难以应用于大规模安全部署。本成果为安全从业者提供了基于证据的指南,以缓解架构失效问题,从而为自适应安全解决方案中可靠且可扩展的合成数据部署确立了基准。