Correlation clustering is a powerful unsupervised learning paradigm that supports positive and negative similarities. In this paper, we assume the similarities are not known in advance. Instead, we employ active learning to iteratively query similarities in a cost-efficient way. In particular, we develop three effective acquisition functions to be used in this setting. One is based on the notion of inconsistency (i.e., when similarities violate the transitive property). The remaining two are based on information-theoretic quantities, i.e., entropy and information gain.
翻译:相关聚类是一种支持正负相似度的强大无监督学习范式。本文假设相似度并非预先已知,而是采用主动学习以成本高效的方式迭代查询相似度。我们专门设计了三种适用于该场景的有效采集函数:第一种基于不一致性概念(即当相似度违反传递性时),另外两种基于信息论量——熵与信息增益。