nCMD: Benign-Anchored Feature Selection for Imbalanced Network Intrusion Detection

Feature selection is critical for network intrusion detection systems (NIDS) operating under high-dimensional, highly imbalanced traffic, as found in operational and defense networks. Traditional filter methods rank features using global statistics computed symmetrically across classes and thus fail to capture the asymmetry of intrusion detection, where attacks are best characterized as deviations from dominant benign traffic. We propose benign-anchored Classwise Mean Deviation (nCMD), a lightweight and interpretable method that scores feature relevance based on the deviation of attack-class distributions from the benign-class mean, rather than a globally biased reference. This approach aligns feature selection with the operational semantics of NIDS at no additional computational cost. Across four benchmark datasets (CICIDS2017, CICDDoS2019, NSL-KDD, and UNSW-NB15), multiple feature budgets, and three downstream classifiers, nCMD matches or exceeds classical filter baselines in macro-averaged F1-score. It achieves the best result on three of the four datasets and under every classifier, with the strongest improvements observed under tight feature budgets and severe class imbalance. These results support benign-anchored ranking as a scalable and interpretable preprocessing component for resource-constrained NIDS.

翻译：特征选择对于在高维、高度非平衡流量环境下运行的网络入侵检测系统（NIDS）至关重要——此类场景常见于运营网络和防御网络。传统过滤方法采用跨类别对称计算的全局统计量对特征排序，因此无法捕捉入侵检测中"攻击特征本质是偏离主导性良性流量"的非对称特性。本文提出良性锚定类均值偏差（nCMD），一种轻量级且可解释的方法，通过计算攻击类分布相对于良性类均值（而非全局偏置参考系）的偏差来评估特征相关性。该方法在无额外计算开销条件下，使特征选择与NIDS操作语义对齐。在四个基准数据集（CICIDS2017、CICDDoS2019、NSL-KDD、UNSW-NB15）、多种特征预算及三个下游分类器上的实验表明，nCMD在宏平均F1分数上达到或超越经典过滤基线方法。该方法在四个数据集中的三个及所有分类器场景下取得最佳结果，其中在特征预算紧张和类别高度非平衡条件下的性能提升最为显著。实验结果证明，良性锚定排序可作为资源受限NIDS中可扩展且可解释的预处理组件。

相关内容

特征选择

关注 5940

特征选择( Feature Selection )也称特征子集选择( Feature Subset Selection , FSS )，或属性选择( Attribute Selection )。是指从已有的M个特征(Feature)中选择N个特征使得系统的特定指标最优化，是从原始特征中选择出一些最有效特征以降低数据集维度的过程,是提高学习算法性能的一个重要手段,也是模式识别中关键的数据预处理步骤。对于一个学习算法来说,好的学习样本是训练模型的关键。

基于深度学习的入侵检测系统：综述

专知会员服务

16+阅读 · 2025年4月11日

适应性异常检测在识别网络物理系统攻击中的应用：系统性文献综述

专知会员服务

17+阅读 · 2024年11月22日

基于博弈论的入侵检测与响应优化综述

专知会员服务

41+阅读 · 2023年7月23日

《在 ISTN 架构上使用决策树机器学习的网络入侵检测系统》美海军2022最新79页论文

专知会员服务

25+阅读 · 2022年12月2日