We study the problem of generating a random variate $X$ from a finite discrete probability distribution $P$ using an entropy source of independent fair coin flips. A classic result from Knuth and Yao shows that the optimal expected number of input coin flips per output sample lies between $H(P)$ and $H(P)\,{+}\,2$, where $H$ is the Shannon entropy function. However, implementing the Knuth and Yao ``entropy-optimal'' sampler entails a tradeoff between using either exponential space with low runtime per sample, or linear space with high runtime per sample. We introduce a new sampling algorithm that avoids this tradeoff: it requires linearithmic space, incurs negligible runtime overhead per sample, and uses an expected number of coin flips that lies in the entropy-optimal range $[H(P), H(P)\,{+}\,2)$. No previous sampler for discrete distributions simultaneously achieves these space, time, and entropy characteristics. Numerical experiments demonstrate improvements in runtime and entropy of the proposed method compared to the celebrated alias method.
翻译:我们研究了使用独立公平硬币抛掷的熵源从有限离散概率分布$P$中生成随机变量$X$的问题。Knuth和Yao的一个经典结果表明,每个输出样本所需的期望输入硬币抛掷次数介于$H(P)$和$H(P)\,{+}\,2$之间,其中$H$为香农熵函数。然而,实现Knuth-Yao“熵最优”采样器需要在两种方案间权衡:要么使用指数级空间且每样本运行时低,要么使用线性空间且每样本运行时高。我们提出了一种避免此权衡的新采样算法:它仅需线性对数级空间,每样本运行时开销可忽略,且使用的期望硬币抛掷次数位于熵最优区间$[H(P), H(P)\,{+}\,2)$内。此前没有任何针对离散分布的采样器能同时实现这些空间、时间和熵特性。数值实验表明,与著名的别名方法相比,所提方法在运行时和熵效率方面均有提升。