GreedyML: A Parallel Algorithm for Maximizing Submodular Functions

We describe a parallel approximation algorithm for maximizing monotone submodular functions subject to hereditary constraints on distributed memory multiprocessors. Our work is motivated by the need to solve submodular optimization problems on massive data sets, for practical applications in areas such as data summarization, machine learning, and graph sparsification. Our work builds on the randomized distributed RandGreedI algorithm, proposed by Barbosa, Ene, Nguyen, and Ward (2015). This algorithm computes a distributed solution by randomly partitioning the data among all the processors and then employing a single accumulation step in which all processors send their partial solutions to one processor. However, for large problems, the accumulation step could exceed the memory available on a processor, and the processor which performs the accumulation could become a computational bottleneck. Here, we propose a generalization of the RandGreedI algorithm that employs multiple accumulation steps to reduce the memory required. We analyze the approximation ratio and the time complexity of the algorithm (in the BSP model). We also evaluate the new GreedyML algorithm on three classes of problems, and report results from massive data sets with millions of elements. The results show that the GreedyML algorithm can solve problems where the sequential Greedy and distributed RandGreedI algorithms fail due to memory constraints. For certain computationally intensive problems, the GreedyML algorithm can be faster than the RandGreedI algorithm. The observed approximation quality of the solutions computed by the GreedyML algorithm closely matches those obtained by the RandGreedI algorithm on these problems.

翻译：我们描述了一种在分布式内存多处理器上针对继承约束下最大化单调子模函数的并行近似算法。这项工作的动机源于在数据摘要、机器学习和图稀疏化等实际应用中，解决大规模数据集上的子模优化问题的需求。我们的工作基于Barbosa、Ene、Nguyen和Ward（2015）提出的随机分布式RandGreedI算法。该算法通过将数据随机划分到所有处理器，然后执行单一累积步骤（所有处理器将其部分解发送至一个处理器）来计算分布式解。然而，对于大规模问题，累积步骤可能超出单个处理器的可用内存，而执行累积的处理器也可能成为计算瓶颈。本文提出了一种RandGreedI算法的泛化版本，该算法采用多个累积步骤以减少内存需求。我们分析了该算法的近似比和时间复杂度（在BSP模型下）。此外，我们在三类问题上评估了新的GreedyML算法，并报告了包含数百万元素的大规模数据集的结果。结果表明，当顺序Greedy算法和分布式RandGreedI算法因内存限制而失败时，GreedyML算法能够解决这些问题。对于某些计算密集型问题，GreedyML算法的速度可能快于RandGreedI算法。在这些问题上，GreedyML算法计算所得解的近似质量与RandGreedI算法获得的结果高度匹配。