Given a matrix $A \in \mathbb{R}^{m\times d}$ with singular values $\sigma_1\geq \cdots \geq \sigma_d$, and a random matrix $G \in \mathbb{R}^{m\times d}$ with iid $N(0,T)$ entries for some $T>0$, we derive new bounds on the Frobenius distance between subspaces spanned by the top-$k$ (right) singular vectors of $A$ and $A+G$. This problem arises in numerous applications in statistics where a data matrix may be corrupted by Gaussian noise, and in the analysis of the Gaussian mechanism in differential privacy, where Gaussian noise is added to data to preserve private information. We show that, for matrices $A$ where the gaps in the top-$k$ singular values are roughly $\Omega(\sigma_k-\sigma_{k+1})$ the expected Frobenius distance between the subspaces is $\tilde{O}(\frac{\sqrt{d}}{\sigma_k-\sigma_{k+1}} \times \sqrt{T})$, improving on previous bounds by a factor of $\frac{\sqrt{m}}{\sqrt{d}} \sqrt{k}$. To obtain our bounds we view the perturbation to the singular vectors as a diffusion process -- the Dyson-Bessel process -- and use tools from stochastic calculus to track the evolution of the subspace spanned by the top-$k$ singular vectors.
翻译:给定矩阵 $A \in \mathbb{R}^{m\times d}$,其奇异值满足 $\sigma_1\geq \cdots \geq \sigma_d$,以及随机矩阵 $G \in \mathbb{R}^{m\times d}$(各元素独立服从 $N(0,T)$ 分布,$T>0$),本文推导了由 $A$ 与 $A+G$ 的前 $k$ 个(右)奇异向量所张成子空间之间Frobenius距离的新上界。该问题常见于统计学中数据矩阵受高斯噪声污染的场景,以及在差分隐私的高斯机制分析中为保护隐私信息而向数据添加高斯噪声的情形。我们证明,对于满足前 $k$ 个奇异值间隙约 $\Omega(\sigma_k-\sigma_{k+1})$ 条件的矩阵 $A$,其子空间期望Frobenius距离为 $\tilde{O}(\frac{\sqrt{d}}{\sigma_k-\sigma_{k+1}} \times \sqrt{T})$,该结果将先前上界改进了 $\frac{\sqrt{m}}{\sqrt{d}} \sqrt{k}$ 倍。为获得此上界,我们将奇异向量扰动视为扩散过程——Dyson-Bessel过程——并运用随机分析工具追踪前 $k$ 个奇异向量所张子空间的演化轨迹。