We investigate analytically the behaviour of the penalized maximum partial likelihood estimator (PMPLE). Our results are derived for a generic separable regularization, but we focus on the elastic net. This penalization is routinely adopted for survival analysis in the high dimensional regime, where the Maximum Partial Likelihood estimator (no regularization) might not even exist. Previous theoretical results require that the number $s$ of non-zero association coefficients is $O(n^{\alpha})$, with $\alpha \in (0,1)$ and $n$ the sample size. Here we accurately characterize the behaviour of the PMPLE when $s$ is proportional to $n$ via the solution of a system of six non-linear equations that can be easily obtained by fixed point iteration. These equations are derived by means of the replica method and under the assumption that the covariates $\mathbf{X}\in \mathbb{R}^p$ follow a multivariate Gaussian law with covariance $\mathbf{I}_p/p$. The solution of the previous equations allows us to investigate the dependency of various metrics of interest and hence their dependency on the ratio $\zeta = p/n$, the fraction of true active components $\nu = s/p$, and the regularization strength. We validate our results by extensive numerical simulations.
翻译:我们解析地研究了惩罚最大偏似然估计量的行为。我们的结果适用于一般可分离正则化,但重点关注弹性网络惩罚。这种惩罚方法在高维生存分析中被常规采用,因为在无正则化的最大偏似然估计量可能甚至不存在的情况下。先前的理论结果要求非零关联系数数量$s$满足$s=O(n^{\alpha})$,其中$\alpha \in (0,1)$,$n$为样本量。本文通过一个可由定点迭代轻松求解的六元非线性方程组,精确刻画了当$s$与$n$成比例时惩罚最大偏似然估计量的行为。这些方程基于复本方法推导得出,并假设协变量$\mathbf{X}\in \mathbb{R}^p$服从协方差矩阵为$\mathbf{I}_p/p$的多元高斯分布。通过求解上述方程组,我们可以探究各类重要指标与参数比$\zeta = p/n$、真实有效成分比例$\nu = s/p$以及正则化强度之间的依赖关系。我们通过大量数值模拟验证了所得结果。