Estimating the gradients of stochastic nodes in stochastic computational graphs is one of the crucial research questions in the deep generative modeling community, which enables the gradient descent optimization on neural network parameters. Stochastic gradient estimators of discrete random variables are widely explored, for example, Gumbel-Softmax reparameterization trick for Bernoulli and categorical distributions. Meanwhile, other discrete distribution cases such as the Poisson, geometric, binomial, multinomial, negative binomial, etc. have not been explored. This paper proposes a generalized version of the Gumbel-Softmax estimator, which is able to reparameterize generic discrete distributions, not restricted to the Bernoulli and the categorical. The proposed estimator utilizes the truncation of discrete random variables, the Gumbel-Softmax trick, and a special form of linear transformation. Our experiments consist of (1) synthetic examples and applications on VAE, which show the efficacy of our methods; and (2) topic models, which demonstrate the value of the proposed estimation in practice.
翻译:在随机计算图中估计随机节点的梯度是深度生成模型领域的关键研究问题之一,它使得神经网络参数能够通过梯度下降进行优化。针对离散随机变量的随机梯度估计器已被广泛探索,例如用于伯努利分布和分类分布的Gumbel-Softmax重参数化技巧。然而,其他离散分布情形,如泊松分布、几何分布、二项分布、多项分布、负二项分布等,尚未得到充分研究。本文提出了一种广义版本的Gumbel-Softmax估计器,能够对通用离散分布(不限于伯努利分布和分类分布)进行重参数化。该估计器利用了离散随机变量的截断技术、Gumbel-Softmax技巧以及一种特殊形式的线性变换。我们的实验包括:(1)合成示例及在变分自编码器(VAE)上的应用,验证了所提方法的有效性;(2)主题模型,展示了所提估计方法在实际应用中的价值。