In this article, we develop a distributed variable screening method for generalized linear models. This method is designed to handle situations where both the sample size and the number of covariates are large. Specifically, the proposed method selects relevant covariates by using a sparsity-restricted surrogate likelihood estimator. It takes into account the joint effects of the covariates rather than just the marginal effect, and this characteristic enhances the reliability of the screening results. We establish the sure screening property of the proposed method, which ensures that with a high probability, the true model is included in the selected model. Simulation studies are conducted to evaluate the finite sample performance of the proposed method, and an application to a real dataset showcases its practical utility.
翻译:本文提出了一种针对广义线性模型的分布式变量筛选方法。该方法专门设计用于处理样本量和协变量数量均较大的情形。具体而言,所提方法通过使用稀疏约束替代似然估计量来筛选相关协变量。该方法不仅考虑协变量的边际效应,更注重其联合效应,这一特性增强了筛选结果的可靠性。我们建立了所提方法的确定筛选性质,确保真实模型以高概率被包含在所选模型中。通过模拟研究评估了所提方法的有限样本性能,并将其实例应用于真实数据集,展示了其实际应用价值。