Theoretical studies of machine learning models commonly consider different limiting regimes in which the learning dynamics of gradient descent becomes theoretically tractable. It is, however, desirable to have a systematically obtained picture of all qualitatively different extreme learning regimes for a particular type of models. In this paper we propose such a picture for large weight-tied linear autoencoders characterized by input and latent dimensions, initialization magnitude, and training set size. This model is nonlinear in the weights and its gradient flow does not have a general theoretical solution. We show that at the level of the formal loss-expansion hierarchy, its extreme regimes are naturally associated with faces of a triangular prism. In particular, there are five basic extreme regimes associated with the 2-faces of the prism: (1) large-data, (2) small-data, (3) mean-field, (4) narrow-latent, and (5) free. For regimes (1,2,3,4), we derive explicit expressions for both train and population limiting loss evolutions under gradient flow, obtaining very good agreement with experimental results.
翻译:机器学习模型的理论研究通常考虑不同的极限机制,在这些机制下梯度下降的学习动态在理论上变得可解。然而,对于特定类型的模型,系统性地获得所有定性不同的极端学习机制图像是可取的。在本文中,我们针对由输入和潜在维度、初始化幅度以及训练集大小表征的大型权值共享线性自编码器提出了这样一种图像。该模型在权值方面是非线性的,其梯度流没有通用的理论解。我们表明,在形式化损失展开层级的层面上,其极端机制自然地与三棱柱的面相关联。具体而言,有五种与棱柱的2-面相关的基本极端机制:(1) 大数据,(2) 小数据,(3) 平均场,(4) 窄潜在,以及 (5) 自由。对于机制 (1,2,3,4),我们推导出了在梯度流下训练损失和总体极限损失演化的显式表达式,与实验结果取得了非常好的一致性。