Recently, a sparse version of Student-t Processes, termed sparse variational Student-t Processes, has been proposed to enhance computational efficiency and flexibility for real-world datasets using stochastic gradient descent. However, traditional gradient descent methods like Adam may not fully exploit the parameter space geometry, potentially leading to slower convergence and suboptimal performance. To mitigate these issues, we adopt natural gradient methods from information geometry for variational parameter optimization of Student-t Processes. This approach leverages the curvature and structure of the parameter space, utilizing tools such as the Fisher information matrix which is linked to the Beta function in our model. This method provides robust mathematical support for the natural gradient algorithm when using Student's t-distribution as the variational distribution. Additionally, we present a mini-batch algorithm for efficiently computing natural gradients. Experimental results across four benchmark datasets demonstrate that our method consistently accelerates convergence speed.
翻译:近年来,一种称为稀疏变分学生t过程的学生t过程稀疏版本被提出,旨在通过随机梯度下降提升现实世界数据集的计算效率与灵活性。然而,传统的梯度下降方法(如Adam)可能无法充分利用参数空间的几何结构,可能导致收敛速度较慢和性能欠佳。为缓解这些问题,我们采用信息几何中的自然梯度方法对学生t过程的变分参数进行优化。该方法利用参数空间的曲率与结构,使用诸如Fisher信息矩阵等工具——该矩阵在我们的模型中与Beta函数相关联。当采用学生t分布作为变分分布时,此方法为自然梯度算法提供了坚实的数学支撑。此外,我们提出了一种用于高效计算自然梯度的小批量算法。在四个基准数据集上的实验结果表明,我们的方法能持续加速收敛过程。