Suppose that we first apply the Lasso to a design matrix, and then update one of its columns. In general, the signs of the Lasso coefficients may change, and there is no closed-form expression for updating the Lasso solution exactly. In this work, we propose an approximate formula for updating a debiased Lasso coefficient. We provide general nonasymptotic error bounds in terms of the norms and correlations of a given design matrix's columns, and then prove asymptotic convergence results for the case of a random design matrix with i.i.d.\ sub-Gaussian row vectors and i.i.d.\ Gaussian noise. Notably, the approximate formula is asymptotically correct for most coordinates in the proportional growth regime, under the mild assumption that each row of the design matrix is sub-Gaussian with a covariance matrix having a bounded condition number. Our proof only requires certain concentration and anti-concentration properties to control various error terms and the number of sign changes. In contrast, rigorously establishing distributional limit properties (e.g.\ Gaussian limits for the debiased Lasso) under similarly general assumptions has been considered open problem in the universality theory. As applications, we show that the approximate formula allows us to reduce the computation complexity of variable selection algorithms that require solving multiple Lasso problems, such as the conditional randomization test and a variant of the knockoff filter.
翻译:假设我们先对设计矩阵应用Lasso,然后更新其中一个列。通常,Lasso系数的符号可能发生变化,且不存在精确更新Lasso解的闭式表达式。本文提出了一种近似公式,用于更新去偏Lasso系数。我们基于给定设计矩阵列的范数和相关性给出了通用的非渐近误差界,进而证明了在具有独立同分布亚高斯行向量和独立同分布高斯噪声的随机设计矩阵情形下的渐近收敛结果。值得注意的是,在比例增长机制下,只要设计矩阵的每一行是亚高斯的且其协方差矩阵具有有界条件数,该近似公式在大多数坐标上都是渐近正确的。我们的证明仅需利用某些集中性与反集中性性质来控制各类误差项及符号变化次数。相比之下,在类似的一般性假设下严格建立分布极限性质(如去偏Lasso的高斯极限)在普适性理论中仍被视为开放问题。在应用方面,我们展示该近似公式能有效降低需要求解多个Lasso问题的变量选择算法(如条件随机化检验和knockoff滤波器变体)的计算复杂度。