Constrained clustering integrates domain knowledge through pairwise constraints. However, existing deep constrained clustering (DCC) methods are either limited by anchors inherent in end-to-end modeling or struggle with learning discriminative Euclidean embedding, restricting their scalability and real-world applicability. To avoid their respective pitfalls, we propose a novel angular constraint embedding approach for DCC, termed SpherePair. Using the SpherePair loss with a geometric formulation, our method faithfully encodes pairwise constraints and leads to embeddings that are clustering-friendly in angular space, effectively separating representation learning from clustering. SpherePair preserves pairwise relations without conflict, removes the need to specify the exact number of clusters, generalizes to unseen data, enables rapid inference of the number of clusters, and is supported by rigorous theoretical guarantees. Comparative evaluations with state-of-the-art DCC methods on diverse benchmarks, along with empirical validation of theoretical insights, confirm its superior performance, scalability, and overall real-world effectiveness. Code is available at \href{https://github.com/spherepaircc/SpherePairCC/tree/main}{our repository}.
翻译:约束聚类通过成对约束整合领域知识。然而,现有的深度约束聚类方法要么受限于端到端建模固有的锚点,要么难以学习具有判别性的欧几里得嵌入,从而限制了其可扩展性和实际应用性。为规避各自缺陷,我们提出了一种新颖的深度约束聚类角约束嵌入方法,称为SpherePair。通过采用具有几何公式的SpherePair损失,我们的方法忠实地编码成对约束,并产生在角空间中利于聚类的嵌入,从而有效地将表示学习与聚类分离。SpherePair无冲突地保持成对关系,无需指定精确的聚类数量,能泛化至未见数据,支持快速推断聚类数量,并具备严格的理论保证支撑。在多样化基准上与最先进的深度约束聚类方法进行的比较评估,以及对理论洞见的实证验证,均证实了其优越的性能、可扩展性及整体实际有效性。代码发布于 \href{https://github.com/spherepaircc/SpherePairCC/tree/main}{我们的代码库}。