In this work, we study how to efficiently obtain perfect samples from a discrete distribution $\mathcal{D}$ given access only to pairwise comparisons of elements of its support. Specifically, we assume access to samples $(x, S)$, where $S$ is drawn from a distribution over sets $\mathcal{Q}$ (indicating the elements being compared), and $x$ is drawn from the conditional distribution $\mathcal{D}_S$ (indicating the winner of the comparison) and aim to output a clean sample $y$ distributed according to $\mathcal{D}$. We mainly focus on the case of pairwise comparisons where all sets $S$ have size 2. We design a Markov chain whose stationary distribution coincides with $\mathcal{D}$ and give an algorithm to obtain exact samples using the technique of Coupling from the Past. However, the sample complexity of this algorithm depends on the structure of the distribution $\mathcal{D}$ and can be even exponential in the support of $\mathcal{D}$ in many natural scenarios. Our main contribution is to provide an efficient exact sampling algorithm whose complexity does not depend on the structure of $\mathcal{D}$. To this end, we give a parametric Markov chain that mixes significantly faster given a good approximation to the stationary distribution. We can obtain such an approximation using an efficient learning from pairwise comparisons algorithm (Shah et al., JMLR 17, 2016). Our technique for speeding up sampling from a Markov chain whose stationary distribution is approximately known is simple, general and possibly of independent interest.
翻译:本文研究在仅能访问离散分布 $\mathcal{D}$ 支持集元素间的成对比较结果时,如何高效获取该分布的完美样本。具体而言,我们假设可获取样本 $(x, S)$,其中 $S$ 从集合分布 $\mathcal{Q}$ 中抽取(指示参与比较的元素),$x$ 基于条件分布 $\mathcal{D}_S$ 抽取(指示比较的胜者),目标是输出服从原始分布 $\mathcal{D}$ 的纯净样本 $y$。我们主要关注所有比较集合 $S$ 大小为2的成对比较情形。我们设计了一条平稳分布与 $\mathcal{D}$ 一致的马尔可夫链,并提出利用"过去耦合"技术获取精确样本的算法。然而,该算法的样本复杂度取决于分布 $\mathcal{D}$ 的结构,在许多自然场景中可能随 $\mathcal{D}$ 支持集大小呈指数增长。我们的主要贡献在于提出一种高效精确采样算法,其复杂度不依赖于 $\mathcal{D}$ 的结构。为此,我们构建了一个参数化马尔可夫链,在获得平稳分布的良好近似时能显著加速混合过程。通过基于成对比较的高效学习算法(Shah 等,JMLR 17, 2016),我们可以获得此类近似。我们的技术——在平稳分布近似已知时加速马尔可夫链采样——具有简洁性、通用性,并可能具备独立研究价值。