Natural gradient methods have been used to optimise the parameters of probability distributions in a variety of settings, often resulting in fast-converging procedures. Unfortunately, for many distributions of interest, computing the natural gradient has a number of challenges. In this work we propose a novel technique for tackling such issues, which involves reframing the optimisation as one with respect to the parameters of a surrogate distribution, for which computing the natural gradient is easy. We give several examples of existing methods that can be interpreted as applying this technique, and propose a new method for applying it to a wide variety of problems. Our method expands the set of distributions that can be efficiently targeted with natural gradients. Furthermore, it is fast, easy to understand, simple to implement using standard autodiff software, and does not require lengthy model-specific derivations. We demonstrate our method on maximum likelihood estimation and variational inference tasks.
翻译:自然梯度方法已在多种场景中用于优化概率分布的参数,通常能实现快速收敛。然而,对于许多感兴趣的分布,计算自然梯度面临诸多挑战。本文提出一种解决此类问题的新技术,通过将优化重新表述为针对代理分布参数的优化(其中自然梯度的计算较为简便)来实现。我们给出了若干可被解释为应用该技术的现有方法实例,并提出一种适用于广泛问题的新应用方法。该方法拓展了可通过自然梯度高效优化的分布集合。此外,该方法具有快速、易于理解、可基于标准自动微分软件简便实现,且无需针对特定模型进行冗长推导的特点。我们在最大似然估计和变分推断任务中验证了该方法的有效性。