Universality in block dependent linear models with applications to nonparametric regression

Over the past decade, characterizing the exact asymptotic risk of regularized estimators in high-dimensional regression has emerged as a popular line of work. This literature considers the proportional asymptotics framework, where the number of features and samples both diverge, at a rate proportional to each other. Substantial work in this area relies on Gaussianity assumptions on the observed covariates. Further, these studies often assume the design entries to be independent and identically distributed. Parallel research investigates the universality of these findings, revealing that results based on the i.i.d.~Gaussian assumption extend to a broad class of designs, such as i.i.d.~sub-Gaussians. However, universality results examining dependent covariates so far focused on correlation-based dependence or a highly structured form of dependence, as permitted by right rotationally invariant designs. In this paper, we break this barrier and study a dependence structure that in general falls outside the purview of these established classes. We seek to pin down the extent to which results based on i.i.d.~Gaussian assumptions persist. We identify a class of designs characterized by a block dependence structure that ensures the universality of i.i.d.~Gaussian-based results. We establish that the optimal values of the regularized empirical risk and the risk associated with convex regularized estimators, such as the Lasso and ridge, converge to the same limit under block dependent designs as they do for i.i.d.~Gaussian entry designs. Our dependence structure differs significantly from correlation-based dependence, and enables, for the first time, asymptotically exact risk characterization in prevalent nonparametric regression problems in high dimensions. Finally, we illustrate through experiments that this universality becomes evident quite early, even for relatively moderate sample sizes.

翻译：过去十年间，在高维回归中刻画正则化估计量的精确渐近风险已成为一项热门研究方向。这类研究采用比例渐近框架，其中特征数与样本数均以相互成比例的速度发散。该领域大量工作依赖于观测协变量的高斯性假设，且常假设设计矩阵元素独立同分布。平行研究探讨了这些结论的普适性，揭示基于独立同分布高斯假设的结果可推广至更广泛的设计类别（如独立同分布次高斯分布）。然而，关于依赖协变量的普适性研究迄今仅聚焦于基于相关性依赖或高度结构化的依赖形式（如右旋转不变设计允许的情形）。本文突破这一限制，研究通常超出这些既有类别范畴的依赖结构，旨在确定基于独立同分布高斯假设的结果在何种程度上依然成立。我们识别出一类具有块依赖结构的设计，该结构能确保基于独立同分布高斯结果的普适性。研究表明，在块依赖设计下，正则化经验风险的最优值以及凸正则化估计量（如Lasso与岭回归）的风险均收敛至与独立同分布高斯设计相同的极限。我们的依赖结构与基于相关性的依赖存在本质差异，从而首次实现在高维非参数回归问题中对风险进行渐近精确刻画。最后，实验证明这种普适性即使在中等样本量下也能较早显现。