Feature selection is crucial for pinpointing relevant features in high-dimensional datasets, mitigating the 'curse of dimensionality,' and enhancing machine learning performance. Traditional feature selection methods for classification use data from all classes to select features for each class. This paper explores feature selection methods that select features for each class separately, using class models based on low-rank generative methods and introducing a signal-to-noise ratio (SNR) feature selection criterion. This novel approach has theoretical true feature recovery guarantees under certain assumptions and is shown to outperform some existing feature selection methods on standard classification datasets.
翻译:特征选择对于在高维数据集中精确定位相关特征、缓解“维度灾难”以及提升机器学习性能至关重要。传统的分类特征选择方法利用所有类别的数据为每个类别选择特征。本文探讨了为每个类别单独选择特征的特征选择方法,该方法基于低秩生成方法构建类别模型,并引入了信噪比特征选择准则。这种新颖方法在特定假设下具有理论上的真实特征恢复保证,并在标准分类数据集上显示出优于某些现有特征选择方法的性能。