Optimal Dynamic Parameterized Subset Sampling

In this paper, we study the Dynamic Parameterized Subset Sampling (DPSS) problem in the Word RAM model. In DPSS, the input is a set,~$S$, of~$n$ items, where each item,~$x$, has a non-negative integer weight,~$w(x)$. Given a pair of query parameters, $(\alpha, \beta)$, each of which is a non-negative rational number, a parameterized subset sampling query on~$S$ seeks to return a subset $T \subseteq S$ such that each item $x \in S$ is selected in~$T$, independently, with probability $p_x(\alpha, \beta) = \min \left\{\frac{w(x)}{\alpha \sum_{x\in S} w(x)+\beta}, 1 \right\}$. More specifically, the DPSS problem is defined in a dynamic setting, where the item set,~$S$, can be updated with insertions of new items or deletions of existing items. Our first main result is an optimal algorithm for solving the DPSS problem, which achieves~$O(n)$ pre-processing time, $O(1+\mu_S(\alpha,\beta))$ expected time for each query parameterized by $(\alpha, \beta)$, given on-the-fly, and $O(1)$ time for each update; here, $\mu_S(\alpha,\beta)$ is the expected size of the query result. At all times, the worst-case space consumption of our algorithm is linear in the current number of items in~$S$. Our second main contribution is a hardness result for the DPSS problem when the item weights are~$O(1)$-word float numbers, rather than integers. Specifically, we reduce Integer Sorting to the deletion-only DPSS problem with float item weights. Our reduction implies that an optimal algorithm for deletion-only DPSS with float item weights (achieving all the same bounds as aforementioned) implies an optimal algorithm for Integer Sorting. The latter remains an important open problem. Last but not least, a key technical ingredient for our first main result is an efficient algorithm for generating Truncated Geometric random variates in $O(1)$ expected time in the Word RAM model.

翻译：本文在Word RAM模型下研究动态参数化子集采样（DPSS）问题。在DPSS问题中，输入为一个包含n个元素的集合S，其中每个元素x具有非负整数权重w(x)。给定一对非负有理数查询参数(α,β)，对S的参数化子集采样查询要求返回子集T⊆S，使得每个元素x∈S以独立概率p_x(α,β)=min{w(x)/[α∑_{x∈S}w(x)+β],1}被选入T。特别地，DPSS问题定义于动态场景中，集合S可通过插入新元素或删除现有元素进行更新。我们的第一个主要成果是求解DPSS问题的最优算法，该算法具有O(n)的预处理时间、针对实时给定的参数(α,β)实现O(1+μ_S(α,β))期望查询时间（其中μ_S(α,β)为查询结果的期望规模）以及O(1)的更新时间复杂度；在任意时刻，算法的最坏情况空间复杂度与当前集合S中的元素数量呈线性关系。我们的第二个主要贡献是针对元素权重为O(1)字长浮点数（而非整数）的DPSS问题的困难性证明。具体而言，我们将整数排序问题归约至仅含删除操作的浮点权重DPSS问题。该归约表明，若存在满足前述所有时间复杂度界限的浮点权重仅删除DPSS问题最优算法，则意味着整数排序问题最优算法的存在，而后者目前仍是重要的开放问题。最后，实现第一个主要成果的关键技术要素是在Word RAM模型中以O(1)期望时间生成截断几何分布随机变量的高效算法。