Conditional sampling is a fundamental task in Bayesian statistics and generative modeling. Consider the problem of sampling from the posterior distribution $P_{X|Y=y^*}$ for some observation $y^*$, where the likelihood $P_{Y|X}$ is known, and we are given $n$ i.i.d. samples $D=\{X_i\}_{i=1}^n$ drawn from an unknown prior distribution $\pi_X$. Suppose that $f(\hat{\pi}_{X^n})$ is the distribution of a posterior sample generated by an algorithm (e.g. a conditional generative model or the Bayes rule) when $\hat{\pi}_{X^n}$ is the empirical distribution of the training data. Although averaging over the randomness of the training data $D$, we have $\mathbb{E}_D\left(\hat{\pi}_{X^n}\right)= \pi_X$, we do not have $\mathbb{E}_D\left\{f(\hat{\pi}_{X^n})\right\}= f(\pi_X)$ due to the nonlinearity of $f$, leading to a bias. In this paper we propose a black-box debiasing scheme that improves the accuracy of such a naive plug-in approach. For any integer $k$ and under boundedness of the likelihood and smoothness of $f$, we generate samples $\hat{X}^{(1)},\dots,\hat{X}^{(k)}$ and weights $w_1,\dots,w_k$ such that $\sum_{i=1}^kw_iP_{\hat{X}^{(i)}}$ is a $k$-th order approximation of $f(\pi_X)$, where the generation process treats $f$ as a black-box. Our generation process achieves higher accuracy when averaged over the randomness of the training data, without degrading the variance, which can be interpreted as improving memorization without compromising generalization in generative models.
翻译:条件采样是贝叶斯统计与生成建模中的基础任务。考虑从后验分布 $P_{X|Y=y^*}$ 中采样的问題,其中似然 $P_{Y|X}$ 已知,且我们获得从未知先验分布 $\pi_X$ 中抽取的 $n$ 个独立同分布样本 $D=\{X_i\}_{i=1}^n$。假设 $f(\hat{\pi}_{X^n})$ 是当 $\hat{\pi}_{X^n}$ 为训练数据经验分布时,由某算法(例如条件生成模型或贝叶斯规则)生成的后验样本的分布。尽管对训练数据 $D$ 的随机性取平均时有 $\mathbb{E}_D\left(\hat{\pi}_{X^n}\right)= \pi_X$,但由于 $f$ 的非线性,我们无法得到 $\mathbb{E}_D\left\{f(\hat{\pi}_{X^n})\right\}= f(\pi_X)$,从而产生偏差。本文提出一种黑盒去偏方案,以提高此类朴素插件方法的准确性。对于任意整数 $k$,在似然有界且 $f$ 光滑的条件下,我们生成样本 $\hat{X}^{(1)},\dots,\hat{X}^{(k)}$ 与权重 $w_1,\dots,w_k$,使得 $\sum_{i=1}^kw_iP_{\hat{X}^{(i)}}$ 成为 $f(\pi_X)$ 的 $k$ 阶近似,其中生成过程将 $f$ 视为黑盒。我们的生成过程在对训练数据随机性取平均时实现了更高的精度,且不增加方差,这可以解释为在生成模型中提升记忆能力而不损害泛化性能。