We propose the use of U-statistics to reduce variance for gradient estimation in importance-weighted variational inference. The key observation is that, given a base gradient estimator that requires $m > 1$ samples and a total of $n > m$ samples to be used for estimation, lower variance is achieved by averaging the base estimator on overlapping batches of size $m$ than disjoint batches, as currently done. We use classical U-statistic theory to analyze the variance reduction, and propose novel approximations with theoretical guarantees to ensure computational efficiency. We find empirically that U-statistic variance reduction can lead to modest to significant improvements in inference performance on a range of models, with little computational cost.
翻译:我们提出使用U-统计量来降低重要性加权变分推断中梯度估计的方差。关键发现是:给定一个需要$m>1$个样本的基础梯度估计器以及总共$n>m$个待用于估计的样本,相比于当前采用的不重叠批次方法,对大小为$m$的重叠批次取平均基础估计器能获得更低的方差。我们运用经典U-统计量理论分析方差缩减效果,并提出具有理论保证的新型近似方法以确保计算效率。实证表明,U-统计量方差缩减能以极小的计算代价,在一系列模型上实现从适度到显著的推断性能提升。