Biclustering, also called co-clustering, block clustering, or two-way clustering, involves the simultaneous clustering of both the rows and columns of a data matrix into distinct groups, such that the rows and columns within a group display similar patterns. As a model problem for biclustering, we consider the $k$-densest-disjoint biclique problem, whose goal is to identify $k$ disjoint complete bipartite subgraphs (called bicliques) of a given weighted complete bipartite graph such that the sum of their densities is maximized. To address this problem, we present a tailored branch-and-cut algorithm. For the upper bound routine, we consider a semidefinite programming relaxation and propose valid inequalities to strengthen the bound. We solve this relaxation in a cutting-plane fashion using a first-order method. For the lower bound, we design a maximum weight matching rounding procedure that exploits the solution of the relaxation solved at each node. Computational results on both synthetic and real-world instances show that the proposed algorithm can solve instances approximately 20 times larger than those handled by general-purpose solvers.
翻译:双聚类(也称为共聚类、块聚类或双向聚类)涉及同时对数据矩阵的行和列进行分组,使同一组内的行和列呈现相似模式。作为双聚类的一个模型问题,我们考虑$k$最稠密不相交双团问题,其目标是在给定的加权完全二分图中,识别出$k$个不相交的完全二分子图(称为双团),使它们的密度之和最大化。针对此问题,我们提出了一种定制化的支路切割算法。在上界计算中,我们采用半定规划松弛,并提出有效不等式来增强界值。我们通过一阶方法以割平面方式求解该松弛问题。对于下界计算,我们设计了一种最大权匹配舍入过程,该过程利用了在每个节点求解的松弛问题的解。在合成数据集和真实数据集上的计算结果表明,所提算法能够解决规模约为通用求解器可处理实例20倍的问题。