The purpose of this study is to introduce a new approach to feature ranking for classification tasks, called in what follows greedy feature selection. In statistical learning, feature selection is usually realized by means of methods that are independent of the classifier applied to perform the prediction using that reduced number of features. Instead, greedy feature selection identifies the most important feature at each step and according to the selected classifier. In the paper, the benefits of such scheme are investigated theoretically in terms of model capacity indicators, such as the Vapnik-Chervonenkis (VC) dimension or the kernel alignment, and tested numerically by considering its application to the problem of predicting geo-effective manifestations of the active Sun.
翻译:本研究旨在提出一种面向分类任务的特征排序新方法,称为"贪心特征选择"。在统计学习中,特征选择通常通过独立于分类器的方法实现——此类方法利用减少后的特征子集进行预测。而贪心特征选择则根据所选分类器,在每一步骤中识别出最重要的特征。本文从模型容量指标(如Vapnik-Chervonenkis(VC)维数或核对齐)角度,对该方案的理论优势进行了探讨,并通过其在预测太阳活动对地效应问题中的应用,进行了数值验证。