We address the problem of biased gradient estimation in deep Boltzmann machines (DBMs). The existing method to obtain an unbiased estimator uses a maximal coupling based on a Gibbs sampler, but when the state is high-dimensional, it takes a long time to converge. In this study, we propose to use a coupling based on the Metropolis-Hastings (MH) and to initialize the state around a local mode of the target distribution. Because of the propensity of MH to reject proposals, the coupling tends to converge in only one step with a high probability, leading to high efficiency. We find that our method allows DBMs to be trained in an end-to-end fashion without greedy pretraining. We also propose some practical techniques to further improve the performance of DBMs. We empirically demonstrate that our training algorithm enables DBMs to show comparable generative performance to other deep generative models, achieving the FID score of 10.33 for MNIST.
翻译:本文针对深度玻尔兹曼机(DBM)中梯度估计偏差问题开展研究。现有无偏估计方法采用基于Gibbs采样的最大耦合策略,但面对高维状态空间时,其收敛耗时较长。本研究提出基于Metropolis-Hastings(MH)的耦合方法,并将初始状态设定在目标分布的局部模态附近。得益于MH算法对提议值的拒绝特性,该耦合机制以高概率实现单步收敛,从而显著提升效率。我们发现该方法无需贪婪预训练即可实现DBM的端到端训练,并进一步提出若干实用技术来提升DBM性能。实验证明,本训练算法能使DBM展现出与其他深度生成模型相当的生成性能,在MNIST数据集上达到了10.33的FID评分。