Clustering aims to group similar objects together while separating dissimilar ones apart. Thereafter, structures hidden in data can be identified to help understand data in an unsupervised manner. Traditional clustering methods such as k-means provide only a single clustering for one data set. Deep clustering methods such as auto-encoder based clustering methods have shown a better performance, but still provide a single clustering. However, a given dataset might have multiple clustering structures and each represents a unique perspective of the data. Therefore, some multiple clustering methods have been developed to discover multiple independent structures hidden in data. Although deep multiple clustering methods provide better performance, how to efficiently capture the alternative perspectives in data is still a problem. In this paper, we propose AugDMC, a novel data Augmentation guided Deep Multiple Clustering method, to tackle the challenge. Specifically, AugDMC leverages data augmentations to automatically extract features related to a certain aspect of the data using a self-supervised prototype-based representation learning, where different aspects of the data can be preserved under different data augmentations. Moreover, a stable optimization strategy is proposed to alleviate the unstable problem from different augmentations. Thereafter, multiple clusterings based on different aspects of the data can be obtained. Experimental results on three real-world datasets compared with state-of-the-art methods validate the effectiveness of the proposed method.
翻译:聚类旨在将相似对象归组,同时分离不相似对象,从而识别数据中隐藏的结构,以无监督方式帮助理解数据。传统聚类方法(如k-means)仅提供单一数据集的一个聚类结果。深度聚类方法(如基于自编码器的聚类方法)虽表现出更优性能,但同样只产生单一聚类。然而,给定数据集可能存在多种聚类结构,每种结构代表数据的一个独特视角。因此,研究者开发了多种多聚类方法,旨在发现数据中隐藏的多个独立结构。尽管深度多聚类方法性能更优,但如何有效捕获数据中的替代视角仍是问题。本文提出AugDMC——一种新颖的数据增强引导的深度多聚类方法,以应对该挑战。具体而言,AugDMC利用数据增强,通过基于原型的自监督表示学习自动提取与数据特定方面相关的特征,其中数据的不同方面可在不同数据增强下得到保留。此外,我们提出一种稳定优化策略,以缓解不同增强带来的不稳定问题。最终,可基于数据的不同方面获得多个聚类结果。在三个真实数据集上的实验结果表明,与最先进方法相比,所提方法具有有效性。