In this paper, we propose a model-free feature selection method for ultra-high dimensional data with mass features. This is a two phases procedure that we propose to use the fused Kolmogorov filter with the random forest based RFE to remove model limitations and reduce the computational complexity. The method is fully nonparametric and can work with various types of datasets. It has several appealing characteristics, i.e., accuracy, model-free, and computational efficiency, and can be widely used in practical problems, such as multiclass classification, nonparametric regression, and Poisson regression, among others. We show that the proposed method is selection consistent and $L_2$ consistent under weak regularity conditions. We further demonstrate the superior performance of the proposed method over other existing methods by simulations and real data examples.
翻译:本文提出了一种针对超高维海量特征数据的无模型特征选择方法。该方法采用两阶段策略,首先结合融合柯尔莫哥洛夫滤波器(fused Kolmogorov filter)与基于随机森林的递归特征消除(RFE),以消除模型限制并降低计算复杂度。该过程完全非参数化,可适用于多种类型的数据集。其具有若干吸引人的特性,如准确性高、无需预设模型以及计算高效,可广泛应用于实际问题中,例如多类别分类、非参数回归及泊松回归等。研究表明,在较弱的正则条件下,所提方法具有选择一致性和$L_2$一致性。通过模拟实验和真实数据案例,我们进一步证明了该方法相较于其他现有方法的优异性能。