Trees are convenient models for obtaining explainable predictions on relatively small datasets. Although there are many proposals for the end-to-end construction of such trees in supervised learning, learning a tree end-to-end for clustering without labels remains an open challenge. As most works focus on interpreting with trees the result of another clustering algorithm, we present here a novel end-to-end trained unsupervised binary tree for clustering: Kauri. This method performs a greedy maximisation of the kernel KMeans objective without requiring the definition of centroids. We compare this model on multiple datasets with recent unsupervised trees and show that Kauri performs identically when using a linear kernel. For other kernels, Kauri often outperforms the concatenation of kernel KMeans and a CART decision tree.
翻译:决策树是在相对小规模数据集上获得可解释预测的便捷模型。尽管在监督学习中有许多关于端到端构建此类树的方案,但在无标签环境下端到端学习聚类树仍是一个未解决的挑战。鉴于现有工作主要利用决策树解释其他聚类算法的结果,本文提出一种全新的端到端训练的无监督二叉聚类树——Kauri。该方法无需定义质心即可实现核KMeans目标的贪心最大化。我们在多个数据集上将本模型与近期提出的无监督决策树进行对比,结果表明:使用线性核时Kauri表现相当,而对于其他核函数,Kauri通常优于核KMeans与CART决策树的级联方法。