We consider the fundamental problem of balanced $k$-means clustering. In particular, we introduce an optimal transport approach to alternating minimization called BalLOT, and we show that it delivers a fast and effective solution to this problem. We establish this with a variety of numerical experiments before proving several theoretical guarantees. First, we prove that for generic data, BalLOT produces integral couplings at each step. Next, we perform a landscape analysis to provide theoretical guarantees for both exact and partial recoveries of planted clusters under the stochastic ball model. Finally, we propose initialization schemes that achieve one-step recovery of planted clusters.
翻译:本文研究平衡$k$-均值聚类这一基础问题。我们提出了一种基于最优传输的交替最小化方法BalLOT,并证明该方法能够为该问题提供快速有效的解决方案。我们通过多种数值实验验证了该方法的有效性,随后给出了若干理论保证。首先,我们证明对于一般性数据,BalLOT在每一步迭代中均能生成整数耦合解。其次,通过景观分析,我们在随机球模型下为植入簇的精确恢复与部分恢复提供了理论保证。最后,我们提出了能够实现植入簇单步恢复的初始化方案。