We introduce a differentiable clustering method based on stochastic perturbations of minimum-weight spanning forests. This allows us to include clustering in end-to-end trainable pipelines, with efficient gradients. We show that our method performs well even in difficult settings, such as data sets with high noise and challenging geometries. We also formulate an ad hoc loss to efficiently learn from partial clustering data using this operation. We demonstrate its performance on several data sets for supervised and semi-supervised tasks.
翻译:我们提出了一种基于最小权重生成森林随机扰动的可微分聚类方法。该方法允许将聚类嵌入端到端可训练流水线中,并实现高效梯度计算。实验表明,即使在数据噪声大、几何结构复杂等困难场景下,我们的方法仍能保持优异性能。我们还设计了一种专用损失函数,可利用此操作从部分聚类数据中高效学习。通过在多个数据集上的监督与半监督任务实验,验证了该方法的有效性。