Intrusion detection systems (IDSs) are essential elements of IT systems. Their key component is a classification module that continuously evaluates some features of the network traffic and identifies possible threats. Its efficiency is greatly affected by the right selection of the features to be monitored. Therefore, the identification of a minimal set of features that are necessary to safely distinguish malicious traffic from benign traffic is indispensable in the course of the development of an IDS. This paper presents the preprocessing and feature selection workflow as well as its results in the case of the CSE-CIC-IDS2018 on AWS dataset, focusing on five attack types. To identify the relevant features, six feature selection methods were applied, and the final ranking of the features was elaborated based on their average score. Next, several subsets of the features were formed based on different ranking threshold values, and each subset was tried with five classification algorithms to determine the optimal feature set for each attack type. During the evaluation, four widely used metrics were taken into consideration.
翻译:入侵检测系统(IDS)是信息系统的重要组成部分。其核心组件是一个分类模块,该模块持续评估网络流量的某些特征并识别潜在威胁。该模块的效率在很大程度上受限于被监控特征的合理选择。因此,在开发入侵检测系统的过程中,确定安全区分恶意流量与良性流量所需的最小特征集至关重要。本文针对亚马逊云服务(AWS)上的 CSE-CIC-IDS2018 数据集,重点研究了五种攻击类型的预处理与特征选择流程及其结果。为识别相关特征,我们采用了六种特征选择方法,并基于特征的平均得分确定了最终排序。随后,根据不同的排序阈值形成了若干特征子集,并利用五种分类算法对每个子集进行测试,以确定每种攻击类型的最优特征集。在评估过程中,我们考虑了四种广泛使用的评价指标。