In this paper, we propose a generic framework for active clustering with queries for pairwise similarities between objects. First, the pairwise similarities can be any positive or negative number, yielding full flexibility in the type of feedback that a user/annotator can provide. Second, the process of querying pairwise similarities is separated from the clustering algorithm, leading to more flexibility in how the query strategies can be constructed. Third, the queries are robust to noise by allowing multiple queries for the same pairwise similarity (i.e., a non-persistent noise model is assumed). Finally, the number of clusters is automatically identified based on the currently known pairwise similarities. In addition, we propose and analyze a number of novel query strategies suited to this active clustering framework. We demonstrate the effectiveness of our framework and the proposed query strategies via several experimental studies.
翻译:本文提出了一种通用的主动聚类框架,通过查询对象间的成对相似度实现聚类。该框架具有以下特点:首先,成对相似度可为任意正数或负数,从而允许用户/标注者提供完全灵活的反馈类型;其次,成对相似度查询过程与聚类算法分离,使查询策略的构建更具灵活性;第三,通过允许对同一成对相似度进行多次查询(即采用非持久噪声模型),使查询对噪声具有鲁棒性;最后,基于当前已知的成对相似度自动识别聚类数量。此外,本文针对该主动聚类框架提出并分析了多种新型查询策略,并通过多项实验研究验证了所提框架及查询策略的有效性。