Hypergraph clustering is a basic algorithmic primitive for analyzing complex datasets and systems characterized by multiway interactions, such as group email conversations, groups of co-purchased retail products, and co-authorship data. This paper presents a practical $O(\log n)$-approximation algorithm for a broad class of hypergraph ratio cut clustering objectives. This includes objectives involving generalized hypergraph cut functions, which allow a user to penalize cut hyperedges differently depending on the number of nodes in each cluster. Our method is a generalization of the cut-matching framework for graph ratio cuts, and relies only on solving maximum s-t flow problems in a special reduced graph. It is significantly faster than existing hypergraph ratio cut algorithms, while also solving a more general problem. In numerical experiments on various types of hypergraphs, we show that it quickly finds ratio cut solutions within a small factor of optimality.
翻译:超图聚类是分析具有多路交互特征的复杂数据集和系统(如群组电子邮件对话、联合购买零售产品群组以及合著数据)的基本算法原语。本文针对一大类超图比率割聚类目标,提出了一种实用的$O(\log n)$-近似算法。该算法涵盖涉及广义超图割函数的目标,此类函数允许用户根据割超边中每类簇的节点数对其施加差异化的惩罚。我们的方法是图比率割中割匹配框架的推广,仅依赖于在特殊简化图中求解最大s-t流问题。该方法在解决更一般问题的同时,其运行速度显著快于现有超图比率割算法。在各类超图的数值实验中,我们展示了该方法能快速找到与最优解误差较小的比率割解。