We introduce a differentiable clustering method based on minimum-weight spanning forests, a variant of spanning trees with several connected components. Our method relies on stochastic perturbations of solutions of linear programs, for smoothing and efficient gradient computations. This allows us to include clustering in end-to-end trainable pipelines. We show that our method performs well even in difficult settings, such as datasets with high noise and challenging geometries. We also formulate an ad hoc loss to efficiently learn from partial clustering data using this operation. We demonstrate its performance on several real world datasets for supervised and semi-supervised tasks.
翻译:我们提出了一种基于最小权重生成森林(生成树的多连通分量变体)的可微聚类方法。该方法通过对线性规划解引入随机扰动来实现结果平滑和高效梯度计算,从而支持将聚类模块集成到端到端可训练管道中。实验表明,即使在噪声强烈或几何结构复杂等困难场景下,该方法仍能保持优异性能。我们进一步设计了专用损失函数,利用该操作从部分聚类数据中实现高效学习。通过在多个真实数据集上的监督与半监督任务验证,展示了该方法的有效性。