Multiple clustering aims to discover various latent structures of data from different aspects. Deep multiple clustering methods have achieved remarkable performance by exploiting complex patterns and relationships in data. However, existing works struggle to flexibly adapt to diverse user-specific needs in data grouping, which may require manual understanding of each clustering. To address these limitations, we introduce Multi-Sub, a novel end-to-end multiple clustering approach that incorporates a multi-modal subspace proxy learning framework in this work. Utilizing the synergistic capabilities of CLIP and GPT-4, Multi-Sub aligns textual prompts expressing user preferences with their corresponding visual representations. This is achieved by automatically generating proxy words from large language models that act as subspace bases, thus allowing for the customized representation of data in terms specific to the user's interests. Our method consistently outperforms existing baselines across a broad set of datasets in visual multiple clustering tasks. Our code is available at https://github.com/Alexander-Yao/Multi-Sub.
翻译:多聚类旨在从不同角度揭示数据的多种潜在结构。深度多聚类方法通过挖掘数据中的复杂模式与关联关系,已取得显著性能。然而,现有方法难以灵活适应数据分组中多样化的用户特定需求,这通常需要人工理解每个聚类结果。为突破这些局限,本文提出Multi-Sub——一种融合多模态子空间代理学习框架的端到端多聚类新方法。该方法利用CLIP与GPT-4的协同能力,将表达用户偏好的文本提示与其对应的视觉表征进行对齐。通过从大语言模型中自动生成作为子空间基的代理词汇,实现了基于用户兴趣特质的定制化数据表征。在视觉多聚类任务的广泛数据集上,本方法均持续优于现有基线模型。代码已开源:https://github.com/Alexander-Yao/Multi-Sub。