An Adaptive Neighborhood Partition Full Conditional Mutual Information Maximization Method for Feature Selection

Feature selection is used to eliminate redundant features and keep relevant features, it can enhance machine learning algorithm's performance and accelerate computing speed. In various methods, mutual information has attracted increasingly more attention as it's an effective criterion to measure variable correlation. However, current works mainly focus on maximizing the feature relevancy with class label and minimizing the feature redundancy within selected features, we reckon that pursuing feature redundancy minimization is reasonable but not necessary because part of so-called redundant features also carries some useful information to promote performance. In terms of mutual information calculation, it may distort the true relationship between two variables without proper neighborhood partition. Traditional methods usually split the continuous variables into several intervals even ignore such influence. We theoretically prove how variable fluctuation negatively influences mutual information calculation. To remove the referred obstacles, for feature selection method, we propose a full conditional mutual information maximization method (FCMIM) which only considers the feature relevancy in two aspects. For obtaining a better partition effect and eliminating the negative influence of attribute fluctuation, we put up an adaptive neighborhood partition algorithm (ANP) with the feedback of mutual information maximization algorithm, the backpropagation process helps search for a proper neighborhood partition parameter. We compare our method with several mutual information methods on 17 benchmark datasets. Results of FCMIM are better than other methods based on different classifiers. Results show that ANP indeed promotes nearly all the mutual information methods' performance.

翻译：特征选择用于消除冗余特征并保留相关特征，它能提升机器学习算法的性能并加速计算速度。在各种方法中，互信息作为衡量变量相关性的有效准则，受到越来越多的关注。然而，当前研究主要集中于最大化特征与类标签的相关性以及最小化已选特征之间的冗余性。我们认为追求特征冗余最小化虽合理但并非必要，因为部分所谓的冗余特征也携带了有助于提升性能的有用信息。在互信息计算方面，若缺乏合适的邻域划分，可能会扭曲两个变量之间的真实关系。传统方法通常将连续变量分割成若干区间，甚至忽略这种影响。我们从理论上证明了变量波动对互信息计算的负面影响。为消除上述障碍，我们提出了一种全条件互信息最大化方法（FCMIM），该方法仅从两个角度考虑特征相关性。为获得更优的划分效果并消除属性波动的负面影响，我们提出了一种自适应邻域划分算法（ANP），该算法以互信息最大化算法的反馈为基础，通过反向传播过程帮助搜索合适的邻域划分参数。我们在17个基准数据集上将该方法与多种互信息方法进行了比较。基于不同分类器的实验结果表明，FCMIM的结果优于其他方法。同时，ANP确实提升了几乎所有互信息方法的性能。