We develop a family of distributed clustering algorithms that work over networks of users. In the proposed scenario, users contain a local dataset and communicate only with their immediate neighbours, with the aim of finding a clustering of the full, joint data. The proposed family, termed Distributed Gradient Clustering (DGC-$\mathcal{F}_\rho$), is parametrized by $\rho \geq 1$, controling the proximity of users' center estimates, with $\mathcal{F}$ determining the clustering loss. Specialized to popular clustering losses like $K$-means and Huber loss, DGC-$\mathcal{F}_\rho$ gives rise to novel distributed clustering algorithms DGC-KM$_\rho$ and DGC-HL$_\rho$, while a novel clustering loss based on the logistic function leads to DGC-LL$_\rho$. We provide a unified analysis and establish several strong results, under mild assumptions. First, the sequence of centers generated by the methods converges to a well-defined notion of fixed point, under any center initialization and value of $\rho$. Second, as $\rho$ increases, the family of fixed points produced by DGC-$\mathcal{F}_\rho$ converges to a notion of consensus fixed points. We show that consensus fixed points of DGC-$\mathcal{F}_{\rho}$ are equivalent to fixed points of gradient clustering over the full data, guaranteeing a clustering of the full data is produced. For the special case of Bregman losses, we show that our fixed points converge to the set of Lloyd points. Numerical experiments on real data confirm our theoretical findings and demonstrate strong performance of the methods.
翻译:我们提出了一类在网络用户间工作的分布式聚类算法族。在预设场景中,每个用户持有本地数据集,仅与相邻节点通信,旨在对完整联合数据进行聚类。该算法族称为分布式梯度聚类(DGC-$\mathcal{F}_\rho$),由参数$\rho \geq 1$控制用户中心估计的邻近程度,$\mathcal{F}$决定聚类损失函数。针对$K$-均值和Huber损失等经典聚类损失函数,DGC-$\mathcal{F}_\rho$衍生出新型分布式聚类算法DGC-KM$_\rho$和DGC-HL$_\rho$,而基于逻辑函数的新型聚类损失则产生DGC-LL$_\rho$。我们在温和假设下提供统一分析框架,建立了多项强结论:首先,无论中心初始化和$\rho$值如何,算法生成的中心序列均收敛至定义明确的固定点;其次,随$\rho$增大,DGC-$\mathcal{F}_\rho$产生的固定点族收敛至共识固定点概念。我们证明DGC-$\mathcal{F}_{\rho}$的共识固定点等价于全数据梯度聚类的固定点,确保生成完整数据的聚类结果。对于Bregman损失的特殊情形,我们证明固定点收敛于Lloyd点集。真实数据数值实验验证了理论发现,并表明该方法具有优异性能。