We characterise the behaviour of the maximum Diaconis-Ylvisaker prior penalized likelihood estimator in high-dimensional logistic regression, where the number of covariates is a fraction $\kappa \in (0,1)$ of the number of observations $n$, as $n \to \infty$. We derive the estimator's aggregate asymptotic behaviour when covariates are independent normal random variables with mean zero and variance $1/n$, and the vector of regression coefficients has length $\gamma \sqrt{n}$, asymptotically. From this foundation, we devise adjusted $Z$-statistics, penalized likelihood ratio statistics, and aggregate asymptotic results with arbitrary covariate covariance. In the process, we fill in gaps in previous literature by formulating a Lipschitz-smooth approximate message passing recursion, to formally transfer the asymptotic results from approximate message passing to logistic regression. While the maximum likelihood estimate asymptotically exists only for a narrow range of $(\kappa, \gamma)$ values, the maximum Diaconis-Ylvisaker prior penalized likelihood estimate not only exists always but is also directly computable using maximum likelihood routines. Thus, our asymptotic results also hold for $(\kappa, \gamma)$ values where results for maximum likelihood are not attainable, with no overhead in implementation or computation. We study the estimator's shrinkage properties and compare it to logistic ridge regression and demonstrate our theoretical findings with simulations.
翻译:我们刻画了高维逻辑回归中最大Diaconis-Ylvisaker先验惩罚似然估计量的行为特征,其中协变量个数与观测数$n$的比例为$\kappa \in (0,1)$,且$n \to \infty$。当协变量为均值为零、方差为$1/n的独立正态随机变量,且回归系数向量长度渐近为$\gamma \sqrt{n}$时,我们推导出该估计量的综合渐近行为。在此基础上,我们构建了调整后的$Z$统计量、惩罚似然比统计量,以及适用于任意协变量协方差的综合渐近结果。在此过程中,我们通过制定Lipschitz光滑近似消息传递递推公式,填补了先前文献的空白,从而将近似消息传递的渐近结果正式迁移至逻辑回归。尽管最大似然估计仅在狭窄的$(\kappa, \gamma)$值范围内渐近存在,但最大Diaconis-Ylvisaker先验惩罚似然估计不仅始终存在,还可直接使用最大似然程序进行计算。因此,我们的渐近结果同样适用于无法获得最大似然结果的$(\kappa, \gamma)$值,且无需增加实现或计算开销。我们研究了该估计量的收缩特性,将其与逻辑岭回归进行对比,并通过仿真验证了理论发现。