Generative model-based deep clustering frameworks excel in classifying complex data, but are limited in handling dynamic and complex features because they require prior knowledge of the number of clusters. In this paper, we propose a nonparametric deep clustering framework that employs an infinite mixture of Gaussians as a prior. Our framework utilizes a memoized online variational inference method that enables the "birth" and "merge" moves of clusters, allowing our framework to cluster data in a "dynamic-adaptive" manner, without requiring prior knowledge of the number of features. We name the framework as DIVA, a Dirichlet Process-based Incremental deep clustering framework via Variational Auto-Encoder. Our framework, which outperforms state-of-the-art baselines, exhibits superior performance in classifying complex data with dynamically changing features, particularly in the case of incremental features. We released our source code implementation at: https://github.com/Ghiara/diva
翻译:基于生成模型的深度聚类框架在处理复杂数据分类方面表现出色,但由于需要预先知道聚类数量,因此无法处理动态与复杂特征。本文提出一种采用无限高斯混合模型作为先验的非参数深度聚类框架。该框架采用记忆化在线变分推理方法,支持聚类的"生成"与"合并"操作,从而能够以"动态自适应"方式对数据进行聚类,无需预先了解特征数量。我们将该框架命名为DIVA(基于狄利克雷过程的变分自编码器增量式深度聚类框架)。该框架在分类具有动态变化特征的复杂数据(特别是增量特征场景)方面展现出优于现有最优基准方法的卓越性能。我们已将源代码实现发布在:https://github.com/Ghiara/diva