Feature screening approaches are effective in selecting active features from data with ultrahigh dimensionality and increasing complexity; however, the majority of existing feature screening approaches are either restricted to a univariate response or rely on some distribution or model assumptions. In this article, we propose a novel sure independence screening approach based on the multivariate rank distance correlation (MrDc-SIS). The MrDc-SIS achieves multiple desirable properties such as being distribution-free, completely nonparametric, scale-free, robust for outliers or heavy tails, and sensitive for hidden structures. Moreover, the MrDc-SIS can be used to screen either univariate or multivariate responses and either one dimensional or multi-dimensional predictors. We establish the asymptotic sure screening consistency property of the MrDc-SIS under a mild condition by lifting previous assumptions about the finite moments. Simulation studies demonstrate that MrDc-SIS outperforms three other closely relevant approaches under various settings. We also apply the MrDc-SIS approach to a multi-omics ovarian carcinoma data downloaded from The Cancer Genome Atlas (TCGA).
翻译:特征筛选方法能有效从超高维且复杂度递增的数据中筛选出活跃特征;然而,现有的大多数特征筛选方法要么局限于单变量响应,要么依赖于某种分布或模型假设。本文提出了一种基于多元秩距离相关的新型确定独立筛选方法(MrDc-SIS)。MrDc-SIS具有多项理想性质,例如无分布依赖性、完全非参数性、尺度无关性、对异常值或厚尾分布的稳健性,以及对隐藏结构的敏感性。此外,MrDc-SIS既可筛选单变量或多变量响应,也能处理一维或多维预测变量。我们在放宽先前关于有限矩假设的条件下,建立了MrDc-SIS在温和条件下的渐近确定筛选一致性性质。模拟研究表明,在各种设定下MrDc-SIS均优于其他三种密切相关的筛选方法。我们还将MrDc-SIS方法应用于从癌症基因组图谱(TCGA)下载的多组学卵巢癌数据集。