Linear Functional Testing with General Loadings in Sparse Regression: Separation Rates and Computational Barriers

We study the problem of testing $H_0: ξ^\topβ=t_0$ in high-dimensional sparse linear regression with Gaussian random design and unknown design covariance. The loading vector $ξ$ is arbitrary, and the exact sparsity level $k$ is unknown but bounded by a known value $k_u$. Tests are required to control Type I error uniformly over the $k_u$-sparse null, while power is evaluated against $k$-sparse alternatives. We construct a computationally efficient mixed test that gives an upper bound on the adaptive separation distance and establish an information-theoretic lower bound calibrated to the magnitude profile of $ξ$. In the ultra-sparse regime $k_u\lesssim \sqrt n/\log p$, these bounds characterize the adaptive separation rate up to logarithmic factors for arbitrary $ξ$. In the moderately sparse regime $\sqrt n/\log p\ll k_u\lesssim n/\log p$, these bounds match for several classes of loading vectors but may differ in general. In this regime, we further prove a low-degree lower bound that matches the upper bound up to logarithmic factors. This provides evidence that improving on the rate of the mixed test, if statistically possible, may be computationally hard. For flat sparse loadings, we complement this evidence with a polynomial-time reduction from sparse CCA. Finally, we examine how information about the design covariance affects the adaptive separation rate in two settings. Under a sparse signed-spiked covariance model, the information-theoretic lower bound is attainable up to logarithmic factors by a computationally inefficient procedure, while the low-degree lower bound and sparse-CCA reduction continue to apply, providing evidence for a statistical-computational gap. When the design covariance is known and diagonal, the adaptive separation rate takes the same form as in the ultra-sparse regime.

翻译：我们研究在高维稀疏线性回归中，针对高斯随机设计和未知设计协方差，检验假设 $H_0: ξ^\topβ=t_0$ 的问题。载荷向量 $ξ$ 是任意的，精确稀疏度 $k$ 未知但受限于已知的上界 $k_u$。检验需在对 $k_u$-稀疏原假设一致控制第一类错误，同时在 $k$-稀疏备择假设下评估检验功效。我们构建了一个计算高效的混合检验，给出了自适应分离距离的上界，并建立了基于 $ξ$ 幅度分布的信息论下界。在超稀疏区域 ($k_u\lesssim \sqrt n/\log p$)，这些界限刻画了任意 $ξ$ 的对数因子下的自适应分离率。在中等稀疏区域 ($\sqrt n/\log p\ll k_u\lesssim n/\log p$)，这些界限对若干类载荷向量匹配，但一般情形可能存在差异。在该区域内，我们进一步证明了低次多项式下界，该下界与上界在忽略对数因子情况下匹配。这表明即使统计上可能改进混合检验的分离率，也可能存在计算困难。针对平坦稀疏载荷，我们通过稀疏CCA的多项式时间归约补充了这一证据。最后，我们考察了设计协方差信息在两种设定下对自适应分离率的影响。在稀疏符号尖峰协方差模型下，信息论下界可通过一种计算低效的算法达到（忽略对数因子），而低次多项式下界与稀疏CCA归约仍然适用，为统计-计算鸿沟提供了证据。当设计协方差已知且为对角矩阵时，自适应分离率的形式与超稀疏区域相同。