In this paper, we study the problem of multivariate shuffled linear regression, where the correspondence between predictors and responses in a linear model is obfuscated by a latent permutation. Specifically, we investigate the model $Y=\tfrac{1}{\sqrt{1+σ^2}}(Π_* X Q_* + σZ)$, where $X$ is an $n*d$ standard Gaussian design matrix, $Z$ is an $n*m$ Gaussian noise matrix, $Π_*$ is an unknown $n*n$ permutation matrix, and $Q_*$ is an unknown $d*m$ on the Grassmanian manifold satisfying $Q_*^{\top} Q_* = \mathbb I_m$. Consider the hypothesis testing problem of distinguishing this model from the case where $X$ and $Y$ are independent Gaussian random matrices of sizes $n*d$ and $n*m$, respectively. Our results reveal a phase transition phenomenon in the performance of low-degree polynomial algorithms for this task. (1) When $m=o(d)$, we show that all degree-$D$ polynomials fail to distinguish these two models even when $σ=0$, provided with $D^4=o\big( \tfrac{d}{m} \big)$. (2) When $m=d$ and $σ=ω(1)$, we show that all degree-$D$ polynomials fail to distinguish these two models provided with $D=o(σ)$. (3) When $m=d$ and $σ=o(1)$, we show that there exists a constant-degree polynomial that strongly distinguish these two models. These results establish a smooth transition in the effectiveness of low-degree polynomial algorithms for this problem, highlighting the interplay between the dimensions $m$ and $d$, the noise level $σ$, and the computational complexity of the testing task.
翻译:本文研究多元混洗线性回归问题,其中线性模型的预测变量与响应变量之间的对应关系被潜在排列所混淆。具体而言,我们研究模型 $Y=\tfrac{1}{\sqrt{1+σ^2}}(Π_* X Q_* + σZ)$,其中 $X$ 为 $n*d$ 标准高斯设计矩阵,$Z$ 为 $n*m$ 高斯噪声矩阵,$Π_*$ 为未知的 $n*n$ 置换矩阵,$Q_*$ 为满足 $Q_*^{\top} Q_* = \mathbb I_m$ 的 Grassmanian 流形上未知的 $d*m$ 矩阵。考虑区分该模型与 $X$ 和 $Y$ 分别为 $n*d$ 和 $n*m$ 独立高斯随机矩阵情形的假设检验问题。我们的结果揭示了低阶多项式算法在此任务中性能的相变现象。(1) 当 $m=o(d)$ 时,若 $D^4=o\big( \tfrac{d}{m} \big)$,则所有 $D$ 阶多项式在 $σ=0$ 时仍无法区分这两个模型。(2) 当 $m=d$ 且 $σ=ω(1)$ 时,若 $D=o(σ)$,则所有 $D$ 阶多项式无法区分这两个模型。(3) 当 $m=d$ 且 $σ=o(1)$ 时,存在常数阶多项式能强区分这两个模型。这些结果建立了低阶多项式算法对此问题有效性的平滑过渡,揭示了维度 $m$ 与 $d$、噪声水平 $σ$ 以及检验任务计算复杂度之间的相互作用。