We show that many machine-learning algorithms are specific instances of a single algorithm called the Bayesian learning rule. The rule, derived from Bayesian principles, yields a wide-range of algorithms from fields such as optimization, deep learning, and graphical models. This includes classical algorithms such as ridge regression, Newton's method, and Kalman filter, as well as modern deep-learning algorithms such as stochastic-gradient descent, RMSprop, and Dropout. The key idea in deriving such algorithms is to approximate the posterior using candidate distributions estimated by using natural gradients. Different candidate distributions result in different algorithms and further approximations to natural gradients give rise to variants of those algorithms. Our work not only unifies, generalizes, and improves existing algorithms, but also helps us design new ones.
翻译:我们表明,许多机器学习算法是单一算法——即贝叶斯学习规则——的具体实例。该规则基于贝叶斯原理推导而来,涵盖了优化、深度学习及图模型等领域中的大量算法。这既包括岭回归、牛顿法和卡尔曼滤波等经典算法,也涵盖了随机梯度下降、RMSprop和Dropout等现代深度学习算法。推导此类算法的核心思想是:利用通过自然梯度估计的候选分布来近似后验分布。不同的候选分布会衍生出不同的算法,而对自然梯度的进一步近似则催生了这些算法的变体。我们的工作不仅统一、泛化并改进了现有算法,还有助于设计新算法。