Variable screening has been a useful research area that deals with ultrahigh-dimensional data. When there exist both marginally and jointly dependent predictors to the response, existing methods such as conditional screening or iterative screening often suffer from instability against the selection of the conditional set or the computational burden, respectively. In this article, we propose a new independence measure, named conditional martingale difference divergence (CMDH), that can be treated as either a conditional or a marginal independence measure. Under regularity conditions, we show that the sure screening property of CMDH holds for both marginally and jointly active variables. Based on this measure, we propose a kernel-based model-free variable screening method, which is efficient, flexible, and stable against high correlation among predictors and heterogeneity of the response. In addition, we provide a data-driven method to select the conditional set. In simulations and real data applications, we demonstrate the superior performance of the proposed method.
翻译:变量筛选是处理超高维数据的重要研究领域。当存在与响应变量既呈边缘相关又呈联合相关的预测变量时,现有方法(如条件筛选或迭代筛选)往往分别面临条件集选取不稳定或计算负担过重的问题。本文提出一种新的独立性度量——条件鞅差散度(CMDH),该度量既可视为条件独立性度量,也可视为边缘独立性度量。在正则性条件下,我们证明CMDH对边缘活跃变量和联合活跃变量均具有确定筛选性质。基于该度量,我们提出一种基于核函数的无模型变量筛选方法,该方法高效灵活,且对预测变量间的高相关性和响应变量的异质性具有稳健性。此外,我们提供了一种数据驱动的条件集选取方法。通过模拟实验和实际数据应用,验证了所提方法的优越性能。