This note extends the results of classical parametric statistics like Fisher and Wilks theorem to modern setups with a high or infinite parameter dimension, limited sample size, and possible model misspecification. We consider a special class of stochastically linear smooth (SLS) models satisfying three major conditions: the stochastic component of the log-likelihood is linear in the model parameter and the expected log-likelihood is a smooth and concave function. For the penalized maximum likelihood estimators (pMLE), we establish three types of results: (1) concentration in a small vicinity of the ``truth''; (2) Fisher and Wilks expansions; (3) risk bounds. In all results, the remainder is given explicitly and can be evaluated in terms of the effective sample size and effective parameter dimension which allows us to identify the so-called \emph{critical parameter dimension}. The results are also dimension and coordinate-free. The obtained finite sample expansions are of special interest because they can be used not only for obtaining the risk bounds but also for inference, studying the asymptotic distribution, analysis of resampling procedures, etc. The main tool for all these expansions is the so-called ``basic lemma'' about linearly perturbed optimization. Despite their generality, all the presented bounds are nearly sharp and the classical asymptotic results can be obtained as simple corollaries. Our results indicate that the use of advanced fourth-order expansions allows to relax the critical dimension condition $ \mathbb{p}^{3} \ll n $ from Spokoiny (2023a) to $ \mathbb{p}^{3/2} \ll n $. Examples for classical models like logistic regression, log-density and precision matrix estimation illustrate the applicability of general results.
翻译:本笔记将经典参数统计中的Fisher和Wilks定理等结果推广到具有高维或无限维参数、有限样本量以及可能存在模型误设的现代设定。我们考虑一类满足三个主要条件的随机线性光滑(SLS)特殊模型:对数似然的随机分量在模型参数中是线性的,且期望对数似然是光滑凹函数。针对惩罚极大似然估计量(pMLE),我们建立了三类结果:(1)在“真实值”小邻域内的集中性;(2)Fisher与Wilks展开;(3)风险界。所有结果中的余项均被显式给出,并可通过有效样本量和有效参数维度进行评估,这使得我们能够识别所谓的**临界参数维度**。这些结果同时具有维度无关性和坐标无关性。所获得的有限样本展开具有特殊意义,因为它们不仅可用于推导风险界,还可用于统计推断、渐近分布研究、重抽样过程分析等。所有这些展开的主要工具是所谓的关于线性扰动优化的“基本引理”。尽管具有一般性,本文给出的所有界近乎尖锐,且经典渐近结果可作为简单推论获得。我们的结果表明,采用高阶四阶展开可将Spokoiny (2023a) 中的临界维度条件 $ \mathbb{p}^{3} \ll n $ 放宽至 $ \mathbb{p}^{3/2} \ll n $。逻辑回归、对数密度估计和精度矩阵估计等经典模型的示例说明了通用结果的适用性。