Variable screening has been a useful research area that deals with ultrahigh-dimensional data. When there exist both marginally and jointly dependent predictors to the response, existing methods such as conditional screening or iterative screening often suffer from instability against the selection of the conditional set or the computational burden, respectively. In this article, we propose a new independence measure, named conditional martingale difference divergence (CMDH), that can be treated as either a conditional or a marginal independence measure. Under regularity conditions, we show that the sure screening property of CMDH holds for both marginally and jointly active variables. Based on this measure, we propose a kernel-based model-free variable screening method, which is efficient, flexible, and stable against high correlation among predictors and heterogeneity of the response. In addition, we provide a data-driven method to select the conditional set. In simulations and real data applications, we demonstrate the superior performance of the proposed method.
翻译:变量筛选是处理超高维数据的重要研究领域。当响应变量同时存在边际相关和联合相关的预测变量时,现有方法(如条件筛选或迭代筛选)往往分别面临条件集选择不稳定性或计算负担沉重的问题。本文提出一种新的独立性度量——条件鞅差散度(CMDH),该度量既可视为条件独立性度量,也可视为边际独立性度量。在正则条件下,我们证明了CMDH对边际活跃变量和联合活跃变量均具有必选筛选性质。基于该度量,我们提出一种无需模型假设的核函数变量筛选方法,该方法高效灵活,且能稳定应对预测变量间的高相关性和响应变量的异质性。此外,我们提供了一种基于数据的条件集选择方法。模拟实验和实际数据分析表明,所提方法具有优越的性能。