The $K$-medoids problem is a challenging combinatorial clustering task, widely used in data analysis applications. While numerous algorithms have been proposed to solve this problem, none of these are able to obtain an exact (globally optimal) solution for the problem in polynomial time. In this paper, we present EKM: a novel algorithm for solving this problem exactly with worst-case $O\left(N^{K+1}\right)$ time complexity. EKM is developed according to recent advances in transformational programming and combinatorial generation, using formal program derivation steps. The derived algorithm is provably correct by construction. We demonstrate the effectiveness of our algorithm by comparing it against various approximate methods on numerous real-world datasets. We show that the wall-clock run time of our algorithm matches the worst-case time complexity analysis on synthetic datasets, clearly outperforming the exponential time complexity of benchmark branch-and-bound based MIP solvers. To our knowledge, this is the first, rigorously-proven polynomial time, practical algorithm for this ubiquitous problem.
翻译:摘要:$K$-中心点问题是一个具有挑战性的组合聚类任务,广泛应用于数据分析领域。尽管已有众多算法被提出用于解决该问题,但目前尚无算法能在多项式时间内获得该问题的精确(全局最优)解。本文提出了一种新颖算法EKM,该算法能以最坏情况$O\left(N^{K+1}\right)$时间复杂度精确求解该问题。EKM基于变换式编程与组合生成的最新进展,通过形式化程序推导步骤开发而成。该算法在构造上具有可证明的正确性。我们通过将算法与多种近似方法在众多真实数据集上进行对比,展示了其有效性。实验表明,算法在合成数据集上的实际运行时间与最坏情况时间复杂度分析一致,明显优于基于分支定界的混合整数规划求解器的指数时间复杂度。据我们所知,这是首个针对这一普遍问题且经过严格证明的多项式时间实用算法。