GreedyML: A Parallel Algorithm for Maximizing Submodular Functions

from arxiv, We have learned that we incorrectly applied Lemma 4.1 in Lemma 4.2. Thus, the proof of Lemma 4.2 is incorrect, and the overall approximation guarantee in Theorem 4.4 we claimed is not right

We describe a parallel approximation algorithm for maximizing monotone submodular functions subject to hereditary constraints on distributed memory multiprocessors. Our work is motivated by the need to solve submodular optimization problems on massive data sets, for practical applications in areas such as data summarization, machine learning, and graph sparsification. Our work builds on the randomized distributed RandGreedI algorithm, proposed by Barbosa, Ene, Nguyen, and Ward (2015). This algorithm computes a distributed solution by randomly partitioning the data among all the processors and then employing a single accumulation step in which all processors send their partial solutions to one processor. However, for large problems, the accumulation step could exceed the memory available on a processor, and the processor which performs the accumulation could become a computational bottleneck. Here, we propose a generalization of the RandGreedI algorithm that employs multiple accumulation steps to reduce the memory required. We analyze the approximation ratio and the time complexity of the algorithm (in the BSP model). We also evaluate the new GreedyML algorithm on three classes of problems, and report results from massive data sets with millions of elements. The results show that the GreedyML algorithm can solve problems where the sequential Greedy and distributed RandGreedI algorithms fail due to memory constraints. For certain computationally intensive problems, the GreedyML algorithm can be faster than the RandGreedI algorithm. The observed approximation quality of the solutions computed by the GreedyML algorithm closely matches those obtained by the RandGreedI algorithm on these problems.

翻译：本文描述了一种在分布式内存多处理器上最大化受遗传约束的单调子模函数的并行近似算法。我们的研究动机源于解决海量数据集上的子模优化问题的需求，其实际应用领域包括数据摘要、机器学习和图稀疏化等。我们的工作建立在Barbosa、Ene、Nguyen和Ward（2015）提出的随机分布式RandGreedI算法基础之上。该算法通过在所有处理器间随机划分数据，然后执行一个单次累积步骤（所有处理器将其部分解发送至一个处理器）来计算分布式解。然而，对于大规模问题，累积步骤可能超出单个处理器的可用内存，且执行累积的处理器可能成为计算瓶颈。本文提出了一种RandGreedI算法的泛化版本，该算法采用多次累积步骤以降低内存需求。我们分析了该算法的近似比和时间复杂度（基于BSP模型）。同时，我们在三类问题上评估了新的GreedyML算法，并报告了在包含数百万元素的海量数据集上的实验结果。结果表明，GreedyML算法能够解决因内存限制导致顺序Greedy算法和分布式RandGreedI算法失效的问题。对于某些计算密集型问题，GreedyML算法的运行速度可能快于RandGreedI算法。实验观察到GreedyML算法所求得的解在近似质量上与此类问题上RandGreedI算法所得结果高度吻合。