Unsupervised clustering under domain shift (UCDS) studies how to transfer the knowledge from abundant unlabeled data from multiple source domains to learn the representation of the unlabeled data in a target domain. In this paper, we introduce Prototype-oriented Clustering with Distillation (PCD) to not only improve the performance and applicability of existing methods for UCDS, but also address the concerns on protecting the privacy of both the data and model of the source domains. PCD first constructs a source clustering model by aligning the distributions of prototypes and data. It then distills the knowledge to the target model through cluster labels provided by the source model while simultaneously clustering the target data. Finally, it refines the target model on the target domain data without guidance from the source model. Experiments across multiple benchmarks show the effectiveness and generalizability of our source-private clustering method.
翻译:无监督领域偏移聚类(UCDS)研究如何从多个源域的大量无标签数据中迁移知识,以学习目标域中无标签数据的表征。本文提出基于蒸馏的原型导向聚类方法(PCD),不仅提升了现有UCDS方法的性能与适用性,还解决了对源域数据与模型隐私保护的关切。PCD首先通过对齐原型与数据分布构建源域聚类模型,随后通过源域模型提供的聚类标签将知识蒸馏至目标域模型,同时完成目标域数据的聚类,最后在无源域模型引导下对目标域数据进行模型精调。跨多个基准实验表明,该源域隐私保护聚类方法具有有效性与泛化能力。