We propose a general transfer learning framework for clustering given a main dataset and an auxiliary one about the same subjects. The two datasets may reflect similar but different latent grouping structures of the subjects. We propose an adaptive transfer clustering (ATC) algorithm that automatically leverages the commonality in the presence of unknown discrepancy, by optimizing an estimated bias-variance decomposition. It applies to a broad class of statistical models including Gaussian mixture models, stochastic block models, and latent class models. A theoretical analysis proves the optimality of ATC under the Gaussian mixture model and explicitly quantifies the benefit of transfer. Extensive simulations and real data experiments confirm our method's effectiveness in various scenarios.
翻译:我们提出了一种通用的迁移学习框架,用于在给定主数据集和关于相同对象的辅助数据集时进行聚类分析。这两个数据集可能反映对象相似但不同的潜在分组结构。我们提出了一种自适应迁移聚类(ATC)算法,该算法通过优化估计的偏差-方差分解,在存在未知差异的情况下自动利用共性。该算法适用于广泛的统计模型类别,包括高斯混合模型、随机块模型和潜在类别模型。理论分析证明了ATC在高斯混合模型下的最优性,并明确量化了迁移的益处。大量模拟和真实数据实验证实了我们的方法在各种场景下的有效性。