Correlation clustering is a well-known unsupervised learning setting that deals with positive and negative pairwise similarities. In this paper, we study the case where the pairwise similarities are not given in advance and must be queried in a cost-efficient way. Thereby, we develop a generic active learning framework for this task that benefits from several advantages, e.g., flexibility in the type of feedback that a user/annotator can provide, adaptation to any correlation clustering algorithm and query strategy, and robustness to noise. In addition, we propose and analyze a number of novel query strategies suited to this setting. We demonstrate the effectiveness of our framework and the proposed query strategies via several experimental studies.
翻译:相关聚类是一种著名的无监督学习设置,其处理正负成对相似性。本文研究成对相似性未预先给出且需以成本高效方式查询的情形。为此,我们为该任务开发了一个通用的主动学习框架,该框架具备若干优势,例如:用户/标注者能够提供的反馈类型具有灵活性、可适应任意相关聚类算法与查询策略、以及对噪声具有鲁棒性。此外,我们提出并分析了多种适用于该场景的新型查询策略。通过多项实验研究,我们验证了所提框架及查询策略的有效性。