In recent works on the theory of machine learning, it has been observed that heavy tail properties of Stochastic Gradient Descent (SGD) can be studied in the probabilistic framework of stochastic recursions. In particular, G\"{u}rb\"{u}zbalaban et al. (arXiv:2006.04740) considered a setup corresponding to linear regression for which iterations of SGD can be modelled by a multivariate affine stochastic recursion $X_k=A_k X_{k-1}+B_k$, for independent and identically distributed pairs $(A_k, B_k)$, where $A_k$ is a random symmetric matrix and $B_k$ is a random vector. In this work, we will answer several open questions of the quoted paper and extend their results by applying the theory of irreducible-proximal (i-p) matrices.
翻译:在近期机器学习理论的研究中,已观察到随机梯度下降(SGD)的重尾特性可在随机递归的概率框架下进行探讨。具体而言,Gürbüzbalaban等人(arXiv:2006.04740)考虑了一种对应线性回归的设置,其中SGD的迭代过程可由多元仿射随机递归$X_k=A_k X_{k-1}+B_k$建模,其中$(A_k, B_k)$为独立同分布对,$A_k$为随机对称矩阵,$B_k$为随机向量。本研究将回答所引论文中的若干开放问题,并通过应用不可约-近端(i-p)矩阵理论扩展其结论。