For training an encoder network to perform amortized variational inference, the Kullback-Leibler (KL) divergence from the exact posterior to its approximation, known as the inclusive or forward KL, is an increasingly popular choice of variational objective due to the mass-covering property of its minimizer. However, minimizing this objective is challenging. A popular existing approach, Reweighted Wake-Sleep (RWS), suffers from heavily biased gradients and a circular pathology that results in highly concentrated variational distributions. As an alternative, we propose SMC-Wake, a procedure for fitting an amortized variational approximation that uses likelihood-tempered sequential Monte Carlo samplers to estimate the gradient of the inclusive KL divergence. We propose three gradient estimators, all of which are asymptotically unbiased in the number of iterations and two of which are strongly consistent. Our method interleaves stochastic gradient updates, SMC samplers, and iterative improvement to an estimate of the normalizing constant to reduce bias from self-normalization. In experiments with both simulated and real datasets, SMC-Wake fits variational distributions that approximate the posterior more accurately than existing methods.
翻译:为训练编码器网络执行摊销变分推理,从精确后验到其近似的Kullback-Leibler (KL)散度(即包容性或前向KL),因其最小化器具有质量覆盖特性,成为变分目标函数日益流行的选择。然而,最小化该目标函数极具挑战性。现有方法Reweighted Wake-Sleep (RWS)存在严重偏置梯度及导致变分分布高度集中的循环病理现象。作为替代方案,我们提出SMC-Wake——一种通过使用似然退火序列蒙特卡洛采样器估计包容性KL散度梯度来拟合摊销变分近似的流程。我们提出三种梯度估计器,所有估计器在迭代次数上均渐近无偏,其中两种估计器具有强相合性。该方法交错执行随机梯度更新、SMC采样器及归一化常数估计的迭代改进,以减少自归一化带来的偏置。在模拟与真实数据集实验中,与现有方法相比,SMC-Wake拟合的变分分布能更准确地逼近后验。