We revisit the following problem: given a set of indices $S = \{1, \dots, n\}$ and weights $w_1, \dots, w_n \in \mathbb{R}_{> 0}$, provide samples from $S$ with distribution $p(i) = w_i / W$ where $W = \sum_j w_j$ gives the proper normalization. In the static setting, there is a simple data structure due to Walker called Alias Table that allows for samples to be drawn in constant time. A more challenging task is to maintain the distribution in a dynamic setting, where elements may be added or removed, or weights may change over time; here, existing solutions restrict the permissible weights, require rebuilding of the associated data structure after a number of updates, or are rather complex. In this paper, we describe, analyze, and engineer a simple data structure for maintaining a discrete probability distribution in the dynamic setting. Construction of the data structure for an arbitrary distribution takes time $O(n)$, sampling takes expected time $O(1)$, and updates of size $\Delta = O(W / n)$ can be processed in time $O(1)$. To evaluate the efficiency of the data structure we conduct an experimental study. The results suggest that the dynamic sampling performance is comparable to the static Alias Table with a minor slowdown.
翻译:我们重新审视以下问题:给定指标集 $S = \{1, \dots, n\}$ 和权重 $w_1, \dots, w_n \in \mathbb{R}_{> 0}$,提供来自 $S$ 的样本,其分布为 $p(i) = w_i / W$,其中 $W = \sum_j w_j$ 表示归一化常数。在静态设置中,Walker 提出了一种称为 Alias Table 的简单数据结构,可在常数时间内完成采样。更具挑战性的任务是维护动态环境中的分布,其中元素可能被添加或删除,或权重可能随时间变化;现有解决方案要么限制允许的权重,要么在多次更新后需要重建相关数据结构,要么相当复杂。在本文中,我们描述、分析并设计了一种用于维护动态环境中离散概率分布的简单数据结构。该数据结构对任意分布的构建时间为 $O(n)$,采样期望时间为 $O(1)$,大小为 $\Delta = O(W / n)$ 的更新可在 $O(1)$ 时间内处理。为评估该数据结构的效率,我们开展了一项实验研究。结果表明,动态采样性能与静态 Alias Table 相当,仅略有减速。