A common way to analyze learning of statistical models is to consider operations in the models parameter space, however this becomes challenging when there is no one-to-one mapping between the parameter space and the underlying statistical model space. Such ``singular models'' occur frequently and exhibit a characteristic decrease in convergence speed of learning trajectories due to attractor behaviors. In this work, we consider a relative reparameterization technique of the parameter space, which yields a general method for extracting regular sub-models from singular models. On the example of Gaussian Mixture Models and Neural Networks we theoretically and numerically analyze the convergence rate for Gradient Descent under both parameterizations. Analyzing second-order methods and explicit properties of the Fisher Information Matrix we distinguish between differences in convergence behavior arising from algorithmic and intrinsic information-geometric aspects.
翻译:分析统计模型学习的常见方法是在模型参数空间中考虑操作,但当参数空间与底层统计模型空间之间不存在一一映射时,这变得具有挑战性。此类“奇异模型”频繁出现,并因吸引子行为而表现出学习轨迹收敛速度的典型下降。在本文中,我们考虑了参数空间的相对重参数化技术,该技术提供了一种从奇异模型中提取正则子模型的通用方法。以高斯混合模型和神经网络为例,我们从理论和数值上分析了两种参数化下梯度下降的收敛速度。通过分析二阶方法以及Fisher信息矩阵的显式性质,我们区分了由算法和信息几何方面差异引起的收敛行为差异。