This paper considers the task of linear regression with shuffled labels, i.e., $\mathbf Y = \mathbf \Pi \mathbf X \mathbf B + \mathbf W$, where $\mathbf Y \in \mathbb R^{n\times m}, \mathbf Pi \in \mathbb R^{n\times n}, \mathbf X\in \mathbb R^{n\times p}, \mathbf B \in \mathbb R^{p\times m}$, and $\mathbf W\in \mathbb R^{n\times m}$, respectively, represent the sensing results, (unknown or missing) corresponding information, sensing matrix, signal of interest, and additive sensing noise. Given the observation $\mathbf Y$ and sensing matrix $\mathbf X$, we propose a one-step estimator to reconstruct $(\mathbf \Pi, \mathbf B)$. From the computational perspective, our estimator's complexity is $O(n^3 + np^2m)$, which is no greater than the maximum complexity of a linear assignment algorithm (e.g., $O(n^3)$) and a least square algorithm (e.g., $O(np^2 m)$). From the statistical perspective, we divide the minimum $snr$ requirement into four regimes, e.g., unknown, hard, medium, and easy regimes; and present sufficient conditions for the correct permutation recovery under each regime: $(i)$ $snr \geq \Omega(1)$ in the easy regime; $(ii)$ $snr \geq \Omega(\log n)$ in the medium regime; and $(iii)$ $snr \geq \Omega((\log n)^{c_0}\cdot n^{{c_1}/{srank(\mathbf B)}})$ in the hard regime ($c_0, c_1$ are some positive constants and $srank(\mathbf B)$ denotes the stable rank of $\mathbf B$). In the end, we also provide numerical experiments to confirm the above claims.
翻译:本文研究标签混排情况下的线性回归任务,即 $\mathbf Y = \mathbf \Pi \mathbf X \mathbf B + \mathbf W$,其中 $\mathbf Y \in \mathbb R^{n\times m}$、$\mathbf \Pi \in \mathbb R^{n\times n}$、$\mathbf X\in \mathbb R^{n\times p}$、$\mathbf B \in \mathbb R^{p\times m}$ 和 $\mathbf W\in \mathbb R^{n\times m}$ 分别表示感知结果、(未知或缺失的)对应关系、感知矩阵、目标信号和加性感知噪声。给定观测值 $\mathbf Y$ 和感知矩阵 $\mathbf X$,我们提出一种单步估计器来重构 $(\mathbf \Pi, \mathbf B)$。从计算角度看,所提估计器的复杂度为 $O(n^3 + np^2m)$,不超过线性分配算法(如 $O(n^3)$)和最小二乘算法(如 $O(np^2 m)$)的最大复杂度。从统计角度看,我们将最小信噪比要求划分为四个区间:未知、困难、中等和容易区间;并给出各区间内实现正确置换恢复的充分条件:(i)在容易区间内,$snr \geq \Omega(1)$;(ii)在中等区间内,$snr \geq \Omega(\log n)$;(iii)在困难区间内,$snr \geq \Omega((\log n)^{c_0}\cdot n^{{c_1}/{srank(\mathbf B)}})$(其中 $c_0, c_1$ 为正常数,$srank(\mathbf B)$ 表示 $\mathbf B$ 的稳定秩)。最后,我们通过数值实验验证了上述结论。