Personalized multiple clustering aims to generate diverse partitions of a dataset based on different user-specific aspects, rather than a single clustering. It has recently drawn research interest for accommodating varying user preferences. Recent approaches primarily use CLIP embeddings with proxy learning to extract representations biased toward user clustering preferences. However, CLIP primarily focuses on coarse image-text alignment, lacking a deep contextual understanding of user interests. To overcome these limitations, we propose an agent-centric personalized clustering framework that leverages multi-modal large language models (MLLMs) as agents to comprehensively traverse a relational graph to search for clusters based on user interests. Due to the advanced reasoning mechanism of MLLMs, the obtained clusters align more closely with user-defined criteria than those obtained from CLIP-based representations. To reduce computational overhead, we shorten the agents' traversal path by constructing a relational graph using user-interest-biased embeddings extracted by MLLMs. A large number of weakly connected edges can be filtered out based on embedding similarity, facilitating an efficient traversal search for agents. Experimental results show that the proposed method achieves NMI scores of 0.9667 and 0.9481 on the Card Order and Card Suits benchmarks, respectively, largely improving the SOTA model by over 140%.
翻译:个性化多重聚类旨在根据用户特定的不同方面生成数据集的多样化划分,而非单一聚类。为适应多样化的用户偏好,该方向近期引起了研究兴趣。现有方法主要利用CLIP嵌入与代理学习来提取偏向用户聚类偏好的表示。然而,CLIP主要关注粗粒度的图像-文本对齐,缺乏对用户兴趣的深度上下文理解。为克服这些限制,我们提出了一种智能体中心的个性化聚类框架,该框架利用多模态大语言模型作为智能体,全面遍历关系图以基于用户兴趣搜索聚类簇。得益于MLLMs先进的推理机制,所获得的聚类簇比基于CLIP表示得到的聚类更紧密地符合用户定义的标准。为降低计算开销,我们通过使用MLLMs提取的用户兴趣偏向嵌入构建关系图,从而缩短智能体的遍历路径。大量弱连接边可基于嵌入相似性被过滤,促进了智能体的高效遍历搜索。实验结果表明,所提方法在Card Order和Card Suits基准测试上分别取得了0.9667和0.9481的NMI分数,较当前最优模型提升了超过140%。