MapReduce (MR) frameworks for maximizing monotone, submodular functions subject to a cardinality constraint (SMCC) have currently only been shown to work with linear-adaptive (non-parallelizable) algorithms, that require large number of distributions in order to utilize the available processors, thus resulting in severe restrictions on the cardinality constraint in addition to limited scalability. Low-adaptive algorithms do not currently satisfy the requirements of these distributed MR frameworks, thereby limiting their performance. We study the SMCC problem in a distributed setting and propose the first MR algorithms with sublinear adaptive complexity. Our algorithms, R-DASH, T-DASH and G-DASH provide $0.316-\varepsilon$, $3/8 -\varepsilon$, and $1 - 1/e -\varepsilon$ approximation ratios, respectively, with nearly optimal adaptive complexity and nearly linear time complexity. Additionally, we provide a framework to increase, under some mild assumptions, the maximum permissible cardinality constraint from $O( n / \ell^2)$ of prior MR algorithms to $O( n / \ell )$, where $n$ is the data size and $\ell$ is the number of machines; under a stronger condition on the objective function, we increase the maximum constraint value to $n$. Finally, we provide empirical evidence to demonstrate that our sublinear-adaptive, distributed algorithms provide orders of magnitude faster runtime compared to current state-of-the-art distributed algorithms.
翻译:针对有界基数约束下单调子模函数最大化(SMCC)的MapReduce(MR)框架,现有方法仅能兼容线性自适应(不可并行化)算法。这类算法需要大量分布操作才能利用可用处理器,导致基数约束严重受限且扩展性不足。当前低自适应算法无法满足这些分布式MR框架的要求,从而制约了其性能表现。本研究在分布式环境下研究SMCC问题,提出了首个具有亚线性自适应复杂度的MR算法。我们提出的R-DASH、T-DASH和G-DASH算法分别实现了$0.316-\varepsilon$、$3/8 -\varepsilon$和$1 - 1/e -\varepsilon$的近似比,并具有近最优的自适应复杂度与近线性时间复杂度。此外,我们提出了一种框架,在温和假设条件下将最大允许基数约束从先前MR算法的$O( n / \ell ^2)$提升至$O( n / \ell )$(其中$n$为数据规模,$\ell$为机器数量);在目标函数满足更强条件时,可将最大约束值提升至$n$。最后,实验结果表明,与当前最优分布式算法相比,我们的亚线性自适应分布式算法实现了数量级的运行速度提升。