Computing gradients of an expectation with respect to the distributional parameters of a discrete distribution is a problem arising in many fields of science and engineering. Typically, this problem is tackled using Reinforce, which frames the problem of gradient estimation as a Monte Carlo simulation. Unfortunately, the Reinforce estimator is especially sensitive to discrepancies between the true probability distribution and the drawn samples, a common issue in low sampling regimes that results in inaccurate gradient estimates. In this paper, we introduce DBsurf, a reinforce-based estimator for discrete distributions that uses a novel sampling procedure to reduce the discrepancy between the samples and the actual distribution. To assess the performance of our estimator, we subject it to a diverse set of tasks. Among existing estimators, DBsurf attains the lowest variance in a least squares problem commonly used in the literature for benchmarking. Furthermore, DBsurf achieves the best results for training variational auto-encoders (VAE) across different datasets and sampling setups. Finally, we apply DBsurf to build a simple and efficient Neural Architecture Search (NAS) algorithm with state-of-the-art performance.
翻译:在科学和工程的众多领域中,常需计算关于离散分布参数期望的梯度。通常,该问题通过Reinforce算法解决,它将梯度估计问题转化为蒙特卡洛模拟。然而,Reinforce估计器对真实概率分布与抽取样本之间的差异尤为敏感——这是低采样场景下的常见问题,会导致梯度估计不准确。本文提出DBsurf,一种基于Reinforce的离散分布估计器,它采用新颖的采样过程以减少样本与实际分布间的差异。为评估该估计器性能,我们将其应用于多种任务中。在常用于文献基准测试的最小二乘问题中,DBsurf在所有现有估计器中实现了最低方差。此外,在不同数据集和采样设置下的变分自编码器(VAE)训练中,DBsurf均取得了最优结果。最后,我们将DBsurf应用于构建一种简单高效的神经网络架构搜索(NAS)算法,其性能达到当前最优水平。