In many real-world network environments, several types of cyberattacks occur at very low rates compared to benign traffic, making them difficult for intrusion detection systems (IDS) to detect reliably. This imbalance causes traditional evaluation metrics, such as accuracy, to often overstate model performance in these conditions, masking failures on minority attack classes that are most important in practice. In this paper, we evaluate a set of base and meta classifiers on low-traffic attacks in the CSE-CIC-IDS2017 dataset and compare their reliability in terms of accuracy and Matthews Correlation Coefficient (MCC). The results show that accuracy consistently inflates performance, while MCC provides a more accurate assessment of a classifier's performance across both majority and minority classes. Meta-classification methods, such as LogitBoost and AdaBoost, demonstrate more effective minority class detection when measured by MCC, revealing trends that accuracy fails to capture. These findings establish the need for imbalance-aware evaluation and make MCC a more trustworthy metric for IDS research involving low-traffic cyberattacks.
翻译:在许多实际网络环境中,若干类型的网络攻击相较于良性流量发生率极低,导致入侵检测系统难以可靠检测。这种不平衡性使得传统评估指标(如准确率)常在此类条件下高估模型性能,掩盖了对实践中至关重要的少数攻击类别的检测失败。本文基于CSE-CIC-IDS2017数据集,针对低频流量攻击评估了一系列基分类器与元分类器,并从准确率与马修斯相关系数两个维度比较其可靠性。结果表明:准确率会持续虚高模型性能,而MCC能更精确评估分类器在多数类与少数类上的综合表现。通过MCC度量,LogitBoost与AdaBoost等元分类方法展现出更有效的少数类检测能力,揭示了准确率指标无法捕捉的性能趋势。这些发现确立了面向不平衡数据评估的必要性,并使MCC成为涉及低频网络攻击的入侵检测研究中更可信的评估指标。