Neurosymbolic Learning for Advanced Persistent Threat Detection under Extreme Class Imbalance

The growing deployment of Internet of Things (IoT) devices in smart cities and industrial environments increases vulnerability to stealthy, multi-stage advanced persistent threats (APTs) that exploit wireless communication. Detection is challenging due to severe class imbalance in network traffic, which limits the effectiveness of traditional deep learning approaches and their lack of explainability in classification decisions. To address these challenges, this paper proposes a neurosymbolic architecture that integrates an optimized BERT model with logic tensor networks (LTN) for explainable APT detection in wireless IoT networks. The proposed method addresses the challenges of mobile IoT environments through efficient feature encoding that transforms network flow data into BERT-compatible sequences while preserving temporal dependencies critical for APT stage identification. Severe class imbalance is mitigated using focal loss, hierarchical classification that separates normal traffic detection from attack categorization, and adaptive sampling strategies. Evaluation on the SCVIC-APT2021 dataset demonstrates an operationally viable binary classification F1 score of 95.27% with a false positive rate of 0.14%, and a 76.75% macro F1 score for multi-class attack categorization. Furthermore, a novel explainability analysis statistically validates the importance of distinct network features. These results demonstrate that neurosymbolic learning enables high-performance, interpretable, and operationally viable APT detection for IoT network monitoring architectures.

翻译：随着物联网设备在智慧城市和工业环境中的广泛部署，利用无线通信进行隐蔽、多阶段攻击的高级持续性威胁日益增多，导致系统脆弱性加剧。由于网络流量中存在严重的类别不平衡问题，传统深度学习方法的效果受限，且其分类决策缺乏可解释性，使得检测工作面临巨大挑战。为应对这些挑战，本文提出一种神经符号架构，将优化的BERT模型与逻辑张量网络相结合，用于无线物联网网络中可解释的APT检测。该方法通过高效的特征编码，将网络流数据转换为BERT兼容的序列，同时保留对APT阶段识别至关重要的时间依赖性，从而应对移动物联网环境的挑战。通过使用焦点损失、将正常流量检测与攻击分类分离的分层分类方法以及自适应采样策略，有效缓解了严重的类别不平衡问题。在SCVIC-APT2021数据集上的评估结果表明，该方法在二元分类中取得了95.27%的操作可行F1分数，误报率为0.14%，在多类攻击分类中宏观F1分数达到76.75%。此外，新颖的可解释性分析从统计学角度验证了不同网络特征的重要性。这些结果表明，神经符号学习能够为物联网网络监控架构实现高性能、可解释且操作可行的APT检测。