In plenty of data analysis tasks, a basic and time-consuming process is to produce a large number of solutions and feed them into downstream processing. Various enumeration algorithms have been developed for this purpose. An enumeration algorithm produces all solutions of a problem instance without repetition. To be a statistically meaningful representation of the solution space, solutions are required to be enumerated in uniformly random order. This paper studies a set of self-reducible NP-problems in three hierarchies, where the problems are polynomially countable ($Sr_{NP}^{FP}$), admit FPTAS ($Sr_{NP}^{FPTAS}$), and admit FPRAS ($Sr_{NP}^{FPRAS}$), respectively. The trivial algorithm based on a (almost) uniform generator is in fact inefficient. We provide a new insight that the (almost) uniform generator is not the end of the story. More efficient algorithmic frameworks are proposed to enumerate solutions in uniformly random order for problems in these three hierarchies. (1) For problems in $Sr_{NP}^{FP}$, we show a random-order enumeration algorithm with polynomial delay (PDREnum); (2) For problems in $Sr_{NP}^{FPTAS}$, we show a Las Vegas random-order enumeration algorithm with expected polynomial delay (PDLVREnum); (3) For problems in $Sr_{NP}^{FPRAS}$, we devise a fully polynomial delay Atlantic City random-order enumeration algorithm with expected delay polynomial in the input size and the given error probability $\delta$ (FPACREnum), which has a probability of at least $1-\delta$ becoming a Las Vegas random-order enumeration algorithm. Finally, to further improve the efficiency of the random-order enumeration algorithms, based on the master/slave paradigm, we present a parallelization with 1.5-optimal enumeration delay and running time, along with the theoretical analysis.
翻译:在许多数据分析任务中,一个基础且耗时的过程是生成大量解并将其输入到下游处理中。为此,研究者已开发出多种枚举算法。枚举算法能够无重复地产生问题实例的所有解。为使解的集合具有统计意义上的代表性,需要以均匀随机顺序枚举解。本文研究了三类层次结构中的自归约NP问题,这些问题分别具有多项式可数性($Sr_{NP}^{FP}$)、存在FPTAS($Sr_{NP}^{FPTAS}$)以及存在FPRAS($Sr_{NP}^{FPRAS}$)。基于(近似)均匀生成器的朴素算法实际上效率低下。我们提出了新见解:(近似)均匀生成器并非问题的终点。针对这三类层次中的问题,我们设计了更高效的算法框架,用于以均匀随机顺序枚举解:(1)对于$Sr_{NP}^{FP}$类问题,我们给出具有多项式延迟的随机序枚举算法(PDREnum);(2)对于$Sr_{NP}^{FPTAS}$类问题,我们给出具有期望多项式延迟的拉斯维加斯随机序枚举算法(PDLVREnum);(3)对于$Sr_{NP}^{FPRAS}$类问题,我们设计了一个完全多项式延迟的大西洋城随机序枚举算法(FPACREnum),其期望延迟关于输入规模和给定错误概率$\delta$呈多项式,且有至少$1-\delta$的概率成为拉斯维加斯随机序枚举算法。最后,为进一步提升随机序枚举算法的效率,基于主从范式,我们提出了具有1.5倍最优枚举延迟与运行时间的并行化方案,并给出了理论分析。