We propose KOALA++, a scalable Kalman-based optimization algorithm that explicitly models structured gradient uncertainty in neural network training. Unlike second-order methods, which rely on expensive second order gradient calculation, our method directly estimates the parameter covariance matrix by recursively updating compact gradient covariance products. This design improves upon the original KOALA framework that assumed diagonal covariance by implicitly capturing richer uncertainty structure without storing the full covariance matrix and avoiding large matrix inversions. Across diverse tasks, including image classification and language modeling, KOALA++ achieves accuracy on par or better than state-of-the-art first- and second-order optimizers while maintaining the efficiency of first-order methods.
翻译:本文提出KOALA++,一种可扩展的基于卡尔曼滤波的优化算法,该算法在神经网络训练中显式建模结构化梯度不确定性。与依赖昂贵二阶梯度计算的二阶优化方法不同,本方法通过递归更新紧凑的梯度协方差乘积来直接估计参数协方差矩阵。该设计改进了原始KOALA框架中假设协方差为对角矩阵的局限,能够隐式捕捉更丰富的不确定性结构,而无需存储完整协方差矩阵或进行大规模矩阵求逆。在图像分类和语言建模等多样化任务中,KOALA++在保持一阶方法效率的同时,其精度达到或超越了当前最先进的一阶与二阶优化器。