Current clustering-based Open Relation Extraction (OpenRE) methods usually adopt a two-stage pipeline. The first stage simultaneously learns relation representations and assignments. The second stage manually labels several instances and thus names the relation for each cluster. However, unsupervised objectives struggle to optimize the model to derive accurate clustering assignments, and the number of clusters has to be supplied in advance. In this paper, we present a novel setting, named actively supervised clustering for OpenRE. Our insight lies in that clustering learning and relation labeling can be alternately performed, providing the necessary guidance for clustering without a significant increase in human effort. The key to the setting is selecting which instances to label. Instead of using classical active labeling strategies designed for fixed known classes, we propose a new strategy, which is applicable to dynamically discover clusters of unknown relations. Experimental results show that our method is able to discover almost all relational clusters in the data and improve the SOTA methods by 10.3\% and 5.2\%, on two datasets respectively.
翻译:当前基于聚类的开放关系抽取(OpenRE)方法通常采用两阶段流水线。第一阶段同时学习关系表示与聚类分配,第二阶段人工标注若干实例并为每个聚类命名关系类型。然而,无监督目标难以优化模型以获得准确的聚类分配,且聚类数量必须预先指定。本文提出一种名为"面向开放关系抽取的主动监督聚类"的新范式。其核心思想在于:聚类学习与关系标注可交替进行,在无需显著增加人力成本的前提下为聚类提供必要的引导。该范式的关键在于选择待标注实例。不同于传统针对固定已知类别的主动标注策略,我们提出一种能够动态发现未知关系聚类的新策略。实验结果表明,我们的方法能发现数据中几乎所有的关系聚类,并在两个数据集上分别将当前最优方法的性能提升了10.3%和5.2%。