Clustering is viewed as an unsupervised technique, but in practice it requires guidance to uncover meaningful structures. We formalize this with guided clustering, a paradigm that uses a guiding variable to steer the discovery process, and introduce the Guided Clustering Variational Autoencoder (GCVAE) as its deep generative realization. GCVAE learns a latent space structured as a Gaussian Mixture Model by optimizing a variational objective that forces the representation to be maximally informative about the guiding variable. This framework allows the resulting clustering to be reoriented by changing the guiding variable, yielding clusters that are meaningful for the specified context. Experiments on public (MNIST-SVHN) and proprietary connected health devices data demonstrate GCVAE's ability to discover coherent and task-relevant clusters in complex settings.
翻译:聚类通常被视为一种无监督技术,但在实际应用中需借助引导机制以揭示有意义的数据结构。本文通过"引导式聚类"范式对此进行形式化定义——该范式利用引导变量驱动聚类过程,并由此提出深度生成式实现:引导式聚类变分自编码器(GCVAE)。GCVAE通过优化变分目标构建基于高斯混合模型的潜空间结构,迫使数据表征最大程度保留引导变量的信息。该框架可通过改变引导变量重新导向聚类结果,从而生成符合特定上下文的有意义聚类。在公开数据集(MNIST-SVHN)和专有互联健康设备数据上的实验表明,GCVAE能够在复杂场景中挖掘出具有一致性与任务相关性的聚类结构。