In this paper, we propose a generic framework for active clustering with queries for pairwise similarities between objects. First, the pairwise similarities can be any positive or negative number, yielding full flexibility in the type of feedback that a user/annotator can provide. Second, the process of querying pairwise similarities is separated from the clustering algorithm, leading to more flexibility in how the query strategies can be constructed. Third, the queries are robust to noise by allowing multiple queries for the same pairwise similarity (i.e., a non-persistent noise model is assumed). Finally, the number of clusters is automatically identified based on the currently known pairwise similarities. In addition, we propose and analyze a number of novel query strategies suited to this active clustering framework. We demonstrate the effectiveness of our framework and the proposed query strategies via several experimental studies.
翻译:本文提出一个通用框架,用于通过对象间成对相似性查询实现主动聚类。首先,成对相似性可以是任意正数或负数,从而为用户/标注者提供的反馈类型赋予完全灵活性。其次,成对相似性查询过程与聚类算法相分离,使得查询策略的构建方式更具灵活性。第三,通过允许对同一成对相似性进行多次查询(即假设采用非持久噪声模型),使得查询对噪声具有鲁棒性。最后,基于当前已知的成对相似性自动识别聚类数量。此外,我们还针对该主动聚类框架提出并分析多种新型查询策略。通过多项实验研究,我们验证了所提框架及查询策略的有效性。