Motivated by the popularity of stochastic rounding in the context of machine learning and the training of large-scale deep neural network models, we consider stochastic nearness rounding of real matrices $\mathbf{A}$ with many more rows than columns. We provide novel theoretical evidence, supported by extensive experimental evaluation that, with high probability, the smallest singular value of a stochastically rounded matrix is well bounded away from zero -- regardless of how close $\mathbf{A}$ is to being rank deficient and even if $\mathbf{A}$ is rank-deficient. In other words, stochastic rounding \textit{implicitly regularizes} tall and skinny matrices $\mathbf{A}$ so that the rounded version has full column rank. Our proofs leverage powerful results in random matrix theory, and the idea that stochastic rounding errors do not concentrate in low-dimensional column spaces.
翻译:受随机舍入在机器学习和大规模深度神经网络模型训练中广泛应用的启发,我们研究了对行数远多于列数的实矩阵$\mathbf{A}$进行随机邻近舍入的问题。我们提供了新的理论证据(辅以大量实验验证)表明:以高概率而言,经随机舍入后矩阵的最小奇异值将远离零值——无论$\mathbf{A}$多么接近秩亏缺状态,甚至当$\mathbf{A}$本身秩亏缺时依然成立。换言之,随机舍入对高瘦矩阵$\mathbf{A}$产生了\textit{隐式正则化}效果,使得舍入后的版本具有满列秩。我们的证明利用了随机矩阵理论中的强有力结果,其核心思想在于:随机舍入误差不会集中在低维列空间中。