We characterise the behaviour of the maximum Diaconis-Ylvisaker prior penalized likelihood estimator in high-dimensional logistic regression, where the number of covariates is a fraction $\kappa \in (0,1)$ of the number of observations $n$, as $n \to \infty$. We derive the estimator's aggregate asymptotic behaviour under this proportional asymptotic regime, when covariates are independent normal random variables with mean zero and the linear predictor has asymptotic variance $\gamma^2$. From this foundation, we devise adjusted $Z$-statistics, penalized likelihood ratio statistics, and aggregate asymptotic results with arbitrary covariate covariance. While the maximum likelihood estimate asymptotically exists only for a narrow range of $(\kappa, \gamma)$ values, the maximum Diaconis-Ylvisaker prior penalized likelihood estimate not only exists always but is also directly computable using maximum likelihood routines. Thus, our asymptotic results also hold for $(\kappa, \gamma)$ values where results for maximum likelihood are not attainable, with no overhead in implementation or computation. We study the estimator's shrinkage properties, compare it to alternative estimation methods that can operate with proportional asymptotics, and present procedures for the estimation of unknown constants that describe the asymptotic behaviour of our estimator. We also provide a conjecture about the behaviour of our estimator when an intercept parameter is present in the model. We present results from extensive numerical studies to demonstrate the theoretical advances and strong evidence to support the conjecture, and illustrate the methodology we put forward through the analysis of a real-world data set on digit recognition.
翻译:本文刻画了高维逻辑回归中基于Diaconis-Ylvisaker先验的惩罚似然极大估计量的渐近行为,其中协变量数量$p$与样本量$n$的比例满足$p/n \to κ\in (0,1)$,且$n \to \infty$。在协变量为独立零均值正态随机变量、线性预测子具有渐近方差$\gamma^2$的条件下,我们推导了该估计量在此比例渐近框架下的聚合渐近性质。基于此理论框架,我们构建了调整后的$Z$统计量、惩罚似然比统计量,并给出了具有任意协变量协方差结构的聚合渐近结果。尽管极大似然估计仅在$(\kappa, \gamma)$的狭窄取值范围内渐近存在,但基于Diaconis-Ylvisaker先验的惩罚似然极大估计不仅始终存在,而且可直接通过标准极大似然计算程序获得。因此,我们的渐近结果同样适用于那些无法获得极大似然估计结果的$(\kappa, \gamma)$取值区域,且在实现与计算上无需额外开销。我们研究了该估计量的收缩特性,将其与能在比例渐近框架下运行的其他估计方法进行比较,并提出了用于估计描述该估计量渐近行为的未知常数的计算流程。此外,我们对模型包含截距参数时估计量的行为提出了猜想。通过大量数值模拟研究,我们展示了理论进展并为该猜想提供了有力证据,最后通过一个真实世界的手写数字识别数据集分析来演示所提出的方法。