We present the first mini-batch algorithm for maximizing a non-negative monotone decomposable submodular function, $F=\sum_{i=1}^N f^i$, under a set of constraints. We consider two sampling approaches: uniform and weighted. We first show that mini-batch with weighted sampling improves over the state of the art sparsifier based approach both in theory and in practice. Surprisingly, our experimental results show that uniform sampling is superior to weighted sampling. However, it is impossible to explain this using worst-case analysis. Our main contribution is using smoothed analysis to provide a theoretical foundation for our experimental results. We show that, under very mild assumptions, uniform sampling is superior for both the mini-batch and the sparsifier approaches. We empirically verify that these assumptions hold for our datasets. Uniform sampling is simple to implement and has complexity independent of $N$, making it the perfect candidate to tackle massive real-world datasets.
翻译:我们提出了首个用于在约束集下最大化非负单调可分解次模函数$F=\sum_{i=1}^N f^i$的小批量算法。我们考虑了两种采样方法:均匀采样与加权采样。我们首先证明,无论在理论还是实践中,采用加权采样的小批量方法均优于当前基于稀疏化的先进方法。令人惊讶的是,实验结果表明均匀采样优于加权采样。然而,仅凭最坏情况分析无法解释这一现象。我们的主要贡献在于利用平滑分析为实验结果提供了理论基础。我们证明,在非常温和的假设下,均匀采样对于小批量方法和稀疏化方法均更优。我们通过实验验证了这些假设在我们的数据集中成立。均匀采样实现简单且复杂度与$N$无关,使其成为处理大规模真实数据集的理想选择。