Comparing Variable Selection and Model Averaging Methods for Logistic Regression

Model uncertainty is a central challenge in statistical models for binary outcomes such as logistic regression, arising when it is unclear which predictors should be included in the model. Many methods have been proposed to address this issue for logistic regression, but their relative performance under realistic conditions remains poorly understood. We therefore conducted a preregistered, simulation-based comparison of 28 established methods for variable selection and inference under model uncertainty, using 11 empirical datasets spanning a range of sample sizes and numbers of predictors, in cases both with and without separation. We found that Bayesian model averaging methods based on g-priors, particularly with g = max(n, p^2), show the strongest overall performance when separation is absent. When separation occurs, penalized likelihood approaches, especially the LASSO, provide the most stable results, while Bayesian model averaging with the local empirical Bayes (EB-local) prior is competitive in both situations. These findings offer practical guidance for applied researchers on how to effectively address model uncertainty in logistic regression in modern empirical and machine learning research.

翻译：模型不确定性是二分类结果统计模型（如逻辑回归）中的一个核心挑战，当无法确定哪些预测变量应纳入模型时便会显现。针对逻辑回归中的这一问题，已有多种方法被提出，但它们在现实条件下的相对性能仍缺乏深入理解。为此，我们基于11个涵盖不同样本量和预测变量数量的实证数据集，在存在与不存在分离现象的两种情况下，对28种成熟的变量选择与模型不确定性推断方法进行了预先注册的仿真比较。研究发现，当不存在分离现象时，基于g先验的贝叶斯模型平均方法（特别是采用g = max(n, p^2)时）表现出最强的综合性能。当出现分离现象时，惩罚似然方法（尤其是LASSO）能提供最稳定的结果，而采用局部经验贝叶斯（EB-local）先验的贝叶斯模型平均方法在两种情况下均具有竞争力。这些发现为应用研究者在现代实证与机器学习研究中如何有效处理逻辑回归的模型不确定性提供了实用指导。

相关内容

逻辑回归

关注 318

逻辑回归（也称“对数几率回归”）（英语：Logistic regression 或logit regression），即逻辑模型（英语：Logit model，也译作“评定模型”、“分类评定模型”）是离散选择法模型之一，属于多重变量分析范畴，是社会学、生物统计学、临床、数量心理学、计量经济学、市场营销等统计实证分析的常用方法。在统计学中，logistic模型(或logit模型)用于对存在的某个类或事件的概率建模，例如通过/失败、赢/输、活着/死了或健康/生病。这可以扩展到建模若干类事件，如确定一个图像是否包含猫、狗、狮子等。图像中检测到的每个物体的概率都在0到1之间，其和为1。

【NeurIPS2025】大型语言模型中关系解码线性算子的结构

专知会员服务

10+阅读 · 2025年11月2日

【ICML2024】基于正则化的持续学习的统计理论

专知会员服务

21+阅读 · 2024年6月11日

【CVPR2024】DiffusionMTL: 从部分标注数据学习多任务去噪扩散模型

专知会员服务

34+阅读 · 2024年3月25日

【ICML2023】SEGA:结构熵引导的图对比学习锚视图

专知会员服务

24+阅读 · 2023年5月10日