The minimum sum-of-squares clustering (MSSC), or k-means type clustering, has been recently extended to exploit prior knowledge on the cardinality of each cluster. Such knowledge is used to increase performance as well as solution quality. In this paper, we propose a global optimization approach based on the branch-and-cut technique to solve the cardinality-constrained MSSC. For the lower bound routine, we use the semidefinite programming (SDP) relaxation recently proposed by Rujeerapaiboon et al. [SIAM J. Optim. 29(2), 1211-1239, (2019)]. However, this relaxation can be used in a branch-and-cut method only for small-size instances. Therefore, we derive a new SDP relaxation that scales better with the instance size and the number of clusters. In both cases, we strengthen the bound by adding polyhedral cuts. Benefiting from a tailored branching strategy which enforces pairwise constraints, we reduce the complexity of the problems arising in the children nodes. For the upper bound, instead, we present a local search procedure that exploits the solution of the SDP relaxation solved at each node. Computational results show that the proposed algorithm globally solves, for the first time, real-world instances of size 10 times larger than those solved by state-of-the-art exact methods.
翻译:最小平方和聚类(MSSC),或称k-means型聚类,近期已被拓展以利用每个簇的基数先验知识。此类知识用于提升性能及解的质量。本文提出一种基于分支剪切技术的全局优化方法,用于求解基数约束MSSC问题。在下界计算中,我们采用Rujeerapaiboon等人[SIAM J. Optim. 29(2), 1211-1239, (2019)]近期提出的半定规划(SDP)松弛。然而,该松弛仅适用于小规模实例的分支剪切方法。为此,我们推导出一种新的SDP松弛,其随实例规模及簇数量的扩展性更优。在两种情形下,我们均通过添加多面体割来强化下界。借助强制成对约束的定制分支策略,我们降低了子节点问题的复杂度。针对上界,我们提出一种局部搜索过程,利用每个节点求解SDP松弛得到的解。计算结果表明,所提算法首次全局求解了比现有最优精确方法大10倍的真实世界实例。