This paper presents a hardware-efficient deep neural network (DNN), optimized through hardware-aware neural architecture search (HW-NAS); the DNN supports the classification of session-level encrypted traffic on resource-constrained Internet of Things (IoT) and edge devices. Thanks to HW-NAS, a 1D convolutional neural network (CNN) is tailored on the ISCX VPN-nonVPN dataset to meet strict memory and computational limits while achieving robust performance. The optimized model attains 96.60% accuracy with just 88.26K parameters, 10.08M FLOPs, and a maximum tensor size of 20.12K. Compared to state-of-the-art models, it achieves reductions of up to 444-fold, 312-fold, and 15-fold in these metrics, respectively, minimizing memory footprint and runtime requirements. The model also achieves up to 99.86% across multiple VPN and traffic classification (TC) tasks; it further generalizes to external benchmarks with up to 99.98% accuracy on USTC-TFC and QUIC NetFlow. In addition, an in-depth study of header-level preprocessing confirms that the optimized model can provide performance across a wide range of configurations, even in scenarios with stricter privacy considerations. Likewise, a reduction in the length of sessions of up to 75% yields significant improvements in efficiency, while maintaining high accuracy with only a negligible drop of 1-2%. However, the importance of careful preprocessing and session length selection in the classification of raw traffic data is still present, as improper settings or aggressive reductions can cause a 7% reduction in accuracy. The quantized architecture was deployed on STM32 microcontrollers and evaluated across input sizes; results confirm that the efficiency gains from shorter sessions translate to practical, low-latency embedded inference. These findings demonstrate the method's practicality for encrypted traffic analysis in constrained IoT networks.
翻译:本文提出了一种通过硬件感知神经架构搜索(HW-NAS)优化的硬件高效深度神经网络(DNN);该网络支持在资源受限的物联网(IoT)和边缘设备上进行会话级加密流量分类。借助HW-NAS,针对ISCX VPN-nonVPN数据集定制的一维卷积神经网络(CNN)在满足严格内存与计算限制的同时,实现了稳健性能。优化后的模型在仅有88.26K参数、10.08M FLOPs以及20.12K最大张量尺寸的条件下,达到了96.60%的准确率。与现有最优模型相比,这些指标分别实现了高达444倍、312倍和15倍的缩减,最小化了内存占用和运行时间需求。该模型在多项VPN和流量分类(TC)任务上还达到了高达99.86%的准确率;并进一步泛化至外部基准测试,在USTC-TFC和QUIC NetFlow上实现了高达99.98%的准确率。此外,对头部级预处理的深入研究证实,优化后的模型能在广泛配置下保持性能,甚至在隐私要求更严格的场景中也是如此。同样,将会话长度缩减高达75%能在维持高精度的同时(准确率仅降低1-2%)显著提升效率。然而,原始流量数据分类中仍需谨慎进行预处理和会话长度选择,因为不恰当的设置或激进缩减可能导致准确率下降7%。量化后的架构部署在STM32微控制器上并针对不同输入尺寸进行了评估;结果证实,较短会话带来的效率提升可转化为实际的低延迟嵌入式推理。这些发现证明了该方法在受限物联网网络中进行加密流量分析的实用性。