We propose a new model-free feature screening method based on energy distances for ultrahigh-dimensional binary classification problems. With a high probability, the proposed method retains only relevant features after discarding all the noise variables. The proposed screening method is also extended to identify pairs of variables that are marginally undetectable but have differences in their joint distributions. Finally, we build a classifier that maintains coherence between the proposed feature selection criteria and discrimination method and also establish its risk consistency. An extensive numerical study with simulated and real benchmark data sets shows clear and convincing advantages of our proposed method over the state-of-the-art methods.
翻译:我们提出了一种基于能量距离的无模型特征筛选方法,用于解决超高维二分类问题。在大概率下,该方法能够保留所有相关特征,同时剔除所有噪声变量。该筛选方法进一步被扩展用于识别那些在边际上无法检测、但在联合分布中存在差异的变量对。最后,我们构建了一个分类器,该分类器保持了所提出的特征选择准则与判别方法之间的一致性,并证明了其风险一致性。通过模拟数据和真实基准数据集的广泛数值研究,我们的方法相较于现有最先进方法展现出明确且令人信服的优势。