Simultaneously identifying contributory variables and controlling the false discovery rate (FDR) in high-dimensional data is an important statistical problem. In this paper, we propose a novel model-free variable selection procedure in sufficient dimension reduction via data splitting technique. The variable selection problem is first connected with a least square procedure with several response transformations. We construct a series of statistics with global symmetry property and then utilize the symmetry to derive a data-driven threshold to achieve error rate control. This method can achieve finite-sample and asymptotic FDR control under some mild conditions. Numerical experiments indicate that our procedure has satisfactory FDR control and higher power compared with existing methods.
翻译:在高维数据中同时识别有贡献变量和控制错误发现率是一个重要的统计问题。本文通过数据分割技术,在充分降维框架下提出了一种新颖的无模型变量选择过程。首先将变量选择问题与带有若干响应变换的最小二乘过程建立联系。我们构造了一系列具有全局对称性的统计量,并利用该对称性推导出数据驱动的阈值以实现错误率控制。该方法能在温和条件下实现有限样本和渐近的FDR控制。数值实验表明,与现有方法相比,我们的过程具有令人满意的FDR控制能力和更高的检验功效。