Constrained clustering is a semi-supervised task that employs a limited amount of labelled data, formulated as constraints, to incorporate domain-specific knowledge and to significantly improve clustering accuracy. Previous work has considered exact optimization formulations that can guarantee optimal clustering while satisfying all constraints, however these approaches lack interpretability. Recently, decision-trees have been used to produce inherently interpretable clustering solutions, however existing approaches do not support clustering constraints and do not provide strong theoretical guarantees on solution quality. In this work, we present a novel SAT-based framework for interpretable clustering that supports clustering constraints and that also provides strong theoretical guarantees on solution quality. We also present new insight into the trade-off between interpretability and satisfaction of such user-provided constraints. Our framework is the first approach for interpretable and constrained clustering. Experiments with a range of real-world and synthetic datasets demonstrate that our approach can produce high-quality and interpretable constrained clustering solutions.
翻译:约束聚类是一种半监督任务,通过利用少量标注数据(以约束形式呈现)融入领域特定知识,显著提升聚类精度。现有研究提出了能够保证满足所有约束条件下的最优聚类的精确优化方案,但这些方法缺乏可解释性。近期,决策树被用于生成天然具备可解释性的聚类解决方案,然而现有方法既不支持聚类约束,也无法提供对解质量的强理论保证。本文提出了一种新颖的基于SAT(可满足性问题)的可解释聚类框架,该框架既支持聚类约束,又能对解质量提供强理论保证。我们同时揭示了可解释性与用户约束满足程度之间的权衡机制。该框架是首个能同时实现可解释性与约束聚类的方案。在多种真实及合成数据集上的实验表明,我们的方法能够生成高质量且可解释的约束聚类解决方案。