Estimating the gradients of stochastic nodes is one of the crucial research questions in the deep generative modeling community, which enables the gradient descent optimization on neural network parameters. This estimation problem becomes further complex when we regard the stochastic nodes to be discrete because pathwise derivative techniques cannot be applied. Hence, the stochastic gradient estimation of discrete distributions requires either a score function method or continuous relaxation of the discrete random variables. This paper proposes a general version of the Gumbel-Softmax estimator with continuous relaxation, and this estimator is able to relax the discreteness of probability distributions including more diverse types, other than categorical and Bernoulli. In detail, we utilize the truncation of discrete random variables and the Gumbel-Softmax trick with a linear transformation for the relaxed reparameterization. The proposed approach enables the relaxed discrete random variable to be reparameterized and to backpropagated through a large scale stochastic computational graph. Our experiments consist of (1) synthetic data analyses, which show the efficacy of our methods; and (2) applications on VAE and topic model, which demonstrate the value of the proposed estimation in practices.
翻译:估计随机节点的梯度是深度生成模型领域的关键研究问题之一,它使得神经网络参数能够通过梯度下降进行优化。当随机节点为离散时,由于无法应用路径导数技术,这一估计问题变得更加复杂。因此,离散分布的随机梯度估计需要采用得分函数方法或对离散随机变量进行连续松弛。本文提出了一种基于连续松弛的广义Gumbel-Softmax估计器,该估计器能够对包括类别分布和伯努利分布在内的更多样化概率分布进行离散性松弛。具体而言,我们利用离散随机变量的截断技术以及结合线性变换的Gumbel-Softmax技巧实现松弛重参数化。所提出的方法允许松弛后的离散随机变量进行重参数化,并能够通过大规模随机计算图进行反向传播。我们的实验包括:(1)合成数据分析,验证了所提方法的有效性;(2)在变分自编码器和主题模型上的应用,展示了所提估计方法在实际场景中的价值。