The bipartite graph structure has shown its promising ability in facilitating the subspace clustering and spectral clustering algorithms for large-scale datasets. To avoid the post-processing via k-means during the bipartite graph partitioning, the constrained Laplacian rank (CLR) is often utilized for constraining the number of connected components (i.e., clusters) in the bipartite graph, which, however, neglects the distribution (or normalization) of these connected components and may lead to imbalanced or even ill clusters. Despite the significant success of normalized cut (Ncut) in general graphs, it remains surprisingly an open problem how to enforce a one-step normalized cut for bipartite graphs, especially with linear-time complexity. In this paper, we first characterize a novel one-step bipartite graph cut (OBCut) criterion with normalized constraints, and theoretically prove its equivalence to a trace maximization problem. Then we extend this cut criterion to a scalable subspace clustering approach, where adaptive anchor learning, bipartite graph learning, and one-step normalized bipartite graph partitioning are simultaneously modeled in a unified objective function, and an alternating optimization algorithm is further designed to solve it in linear time. Experiments on a variety of general and large-scale datasets demonstrate the effectiveness and scalability of our approach.
翻译:二分图结构在促进大规模数据集的子空间聚类和谱聚类算法方面展现出巨大的潜力。为避免二分图划分后通过k-means进行后处理,通常使用约束拉普拉斯秩(CLR)来限制二分图中连通分量(即簇)的数量,然而该方法忽略了这些连通分量的分布(或归一化),可能导致不均衡甚至病态的聚类结果。尽管归一化割(Ncut)在一般图上取得了显著成功,但如何在二分图上实现一步式归一化割,尤其是保持线性时间复杂度,仍是一个令人惊讶的开放问题。本文首先提出了一种带有归一化约束的新型一步式二分图割(OBCut)准则,并从理论上证明其等价于一个迹最大化问题。随后,我们将该割准则扩展为一种可扩展的子空间聚类方法,在此方法中,自适应锚点学习、二分图学习和一步式归一化二分图划分被同时建模在一个统一的目标函数中,并进一步设计了一种交替优化算法以在线性时间内求解。在多种通用及大规模数据集上的实验证明了我们方法的有效性和可扩展性。