高维高斯与自助法近似在广义线性模型中的应用 (High Dimensional Gaussian and Bootstrap Approximations in Generalized Linear Models)

Generalized Linear Models (GLMs) extend ordinary linear regression by linking the mean of the response variable to covariates through appropriate link functions. This paper investigates the asymptotic behavior of GLM estimators when the parameter dimension $d$ grows with the sample size $n$. In the first part, we establish Gaussian approximation results for the distribution of a properly centered and scaled GLM estimator uniformly over class of convex sets and Euclidean balls. Using high-dimensional results from Fang and Koike (2024) for the leading Bahadur term, bounding remainder terms as in He and Shao (2000), and applying Nazarov's (2003) Gaussian isoperimetric inequality, we show that Gaussian approximation holds when $d = o(n^{2/5})$ for convex sets and $d = o(n^{1/2})$ for Euclidean balls-the best possible rates matching those for high-dimensional sample means. We further extend these results to the bootstrap approximation when the covariance matrix is unknown. In the second part, when $d>>n$, a natural question is to answer whether all covariates are equally important. To answer that, we employ sparsity in GLM through the Lasso estimator. While Lasso is widely used for variable selection, it cannot achieve both Variable Selection Consistency (VSC) and $n^{1/2}$-consistency simultaneously (Lahiri, 2021). Under the regime ensuring VSC, we show that Gaussian approximation for the Lasso estimator fails. To overcome this, we propose a Perturbation Bootstrap (PB) approach and establish a Berry-Esseen type bound for its approximation uniformly over class of convex sets. Simulation studies confirm the strong finite-sample performance of the proposed method.

翻译：广义线性模型（GLMs）通过适当的连接函数将响应变量的均值与协变量相关联，从而扩展了普通线性回归。本文研究了当参数维度$d$随样本量$n$增长时，GLM估计量的渐近性质。在第一部分中，我们针对一类凸集和欧几里得球，建立了经过适当中心化和尺度化的GLM估计量分布的高斯近似结果。利用Fang和Koike（2024）关于主要Bahadur项的高维结果，结合He和Shao（2000）对余项的控制方法，并应用Nazarov（2003）的高斯等周不等式，我们证明了当$d = o(n^{2/5})$时对于凸集、$d = o(n^{1/2})$时对于欧几里得球，高斯近似成立——这些速率与高维样本均值的最佳可能速率一致。我们进一步将这些结果推广到协方差矩阵未知时的自助法近似。在第二部分中，当$d>>n$时，一个自然的问题是探究是否所有协变量同等重要。为此，我们通过Lasso估计量在GLM中引入稀疏性。尽管Lasso被广泛用于变量选择，但它无法同时实现变量选择一致性（VSC）与$n^{1/2}$-相合性（Lahiri, 2021）。在确保VSC成立的条件下，我们证明了Lasso估计量的高斯近似失效。为克服此问题，我们提出了一种扰动自助法（PB）方法，并针对一类凸集建立了其近似的Berry-Esseen型误差界。模拟研究证实了所提方法具有优异的有限样本性能。