The effectiveness of machine learning models is significantly affected by the size of the dataset and the quality of features as redundant and irrelevant features can radically degrade the performance. This paper proposes IGRF-RFE: a hybrid feature selection method tasked for multi-class network anomalies using a Multilayer perceptron (MLP) network. IGRF-RFE can be considered as a feature reduction technique based on both the filter feature selection method and the wrapper feature selection method. In our proposed method, we use the filter feature selection method, which is the combination of Information Gain and Random Forest Importance, to reduce the feature subset search space. Then, we apply recursive feature elimination(RFE) as a wrapper feature selection method to further eliminate redundant features recursively on the reduced feature subsets. Our experimental results obtained based on the UNSW-NB15 dataset confirm that our proposed method can improve the accuracy of anomaly detection while reducing the feature dimension. The results show that the feature dimension is reduced from 42 to 23 while the multi-classification accuracy of MLP is improved from 82.25% to 84.24%.
翻译:机器学习模型的有效性深受数据集规模与特征质量的影响,冗余和无关特征可能显著降低模型性能。本文提出IGRF-RFE:一种面向多层感知器(MLP)网络的多类网络异常检测混合特征选择方法。该方法可视为融合过滤式与封装式特征选择技术的特征降维策略。具体而言,我们首先采用信息增益与随机森林重要性相结合的过滤式特征选择方法缩减特征子集搜索空间,随后通过递归特征消除(RFE)这一封装式特征选择方式,在降维后的特征子集上进一步递归剔除冗余特征。基于UNSW-NB15数据集的实验结果表明,所提方法能够在降低特征维度的同时提升异常检测准确率:特征维度从42降至23,MLP多分类准确率由82.25%提升至84.24%。