High-dimensional datasets depict a challenge for learning tasks in data mining and machine learning. Feature selection is an effective technique in dealing with dimensionality reduction. It is often an essential data processing step prior to applying a learning algorithm. Over the decades, filter feature selection methods have evolved from simple univariate relevance ranking algorithms to more sophisticated relevance-redundancy trade-offs and to multivariate dependencies-based approaches in recent years. This tendency to capture multivariate dependence aims at obtaining unique information about the class from the intercooperation among features. This paper presents a comprehensive survey of the state-of-the-art work on filter feature selection methods assisted by feature intercooperation, and summarizes the contributions of different approaches found in the literature. Furthermore, current issues and challenges are introduced to identify promising future research and development.
翻译:高维数据集对数据挖掘和机器学习中的学习任务构成了挑战。特征选择是处理降维的有效技术,通常是在应用学习算法之前必不可少的数据预处理步骤。几十年来,过滤式特征选择方法已从简单的单变量相关性排序算法,发展到更复杂的相关性-冗余性权衡方法,并直至近年来基于多元依赖关系的方法。这种捕捉多元依赖关系的趋势旨在通过特征间的相互协作获取关于类别的独特信息。本文全面综述了基于特征协作的过滤式特征选择方法的最新研究成果,归纳了文献中不同方法的贡献。此外,本文还介绍了当前存在的问题与挑战,以确定未来有前景的研究与发展方向。