Learning from Semi-Factuals: A Debiased and Semantic-Aware Framework for Generalized Relation Discovery

We introduce a novel task, called Generalized Relation Discovery (GRD), for open-world relation extraction. GRD aims to identify unlabeled instances in existing pre-defined relations or discover novel relations by assigning instances to clusters as well as providing specific meanings for these clusters. The key challenges of GRD are how to mitigate the serious model biases caused by labeled pre-defined relations to learn effective relational representations and how to determine the specific semantics of novel relations during classifying or clustering unlabeled instances. We then propose a novel framework, SFGRD, for this task to solve the above issues by learning from semi-factuals in two stages. The first stage is semi-factual generation implemented by a tri-view debiased relation representation module, in which we take each original sentence as the main view and design two debiased views to generate semi-factual examples for this sentence. The second stage is semi-factual thinking executed by a dual-space tri-view collaborative relation learning module, where we design a cluster-semantic space and a class-index space to learn relational semantics and relation label indices, respectively. In addition, we devise alignment and selection strategies to integrate two spaces and establish a self-supervised learning loop for unlabeled data by doing semi-factual thinking across three views. Extensive experimental results show that SFGRD surpasses state-of-the-art models in terms of accuracy by 2.36\% $\sim$5.78\% and cosine similarity by 32.19\%$\sim$ 84.45\% for relation label index and relation semantic quality, respectively. To the best of our knowledge, we are the first to exploit the efficacy of semi-factuals in relation extraction.

翻译：我们提出了一项新任务，称为广义关系发现（GRD），用于开放世界关系抽取。GRD 旨在识别现有预定义关系中的未标记实例，或通过将实例分配到聚类并赋予这些聚类特定含义来发现新关系。GRD 的关键挑战在于如何缓解由标记的预定义关系引起的严重模型偏差，以学习有效的关系表示，以及如何在分类或聚类未标记实例时确定新关系的具体语义。为此，我们提出了一种新颖框架 SFGRD，通过两阶段从半事实中学习来解决上述问题。第一阶段是半事实生成，由三视图去偏关系表示模块实现，其中我们将每个原始句子作为主视图，并设计两个去偏视图为该句子生成半事实示例。第二阶段是半事实思考，由双空间三视图协同关系学习模块执行，其中我们设计了一个聚类语义空间和一个类别索引空间，分别用于学习关系语义和关系标签索引。此外，我们设计了对齐与选择策略以整合两个空间，并通过跨三个视图进行半事实思考为未标记数据建立自监督学习循环。大量实验结果表明，SFGRD 在关系标签索引和关系语义质量方面，分别以 2.36\%~5.78\% 的准确率提升和 32.19\%~84.45\% 的余弦相似度提升超越了最先进模型。据我们所知，我们是首次在半事实在关系抽取中的有效性方面进行探索。