Correlation clustering is a well-known unsupervised learning setting that deals with positive and negative pairwise similarities. In this paper, we study the case where the pairwise similarities are not given in advance and must be queried in a cost-efficient way. Thereby, we develop a generic active learning framework for this task that benefits from several advantages, e.g., flexibility in the type of feedback that a user/annotator can provide, adaptation to any correlation clustering algorithm and query strategy, and robustness to noise. In addition, we propose and analyze a number of novel query strategies suited to this setting. We demonstrate the effectiveness of our framework and the proposed query strategies via several experimental studies.
翻译:相关聚类是一种著名的无监督学习框架,用于处理正负成对相似性。本文研究成对相似性未预先给定且需以成本高效方式查询的情况。为此,我们开发了一种通用的主动学习框架,具有多种优势,例如用户/标注者可提供反馈类型的灵活性、可适配任意相关聚类算法与查询策略,以及对噪声的鲁棒性。此外,我们提出并分析了多种适用于该场景的新型查询策略。通过多项实验研究,我们验证了所提框架及查询策略的有效性。