Relation extraction (RE) tasks show promising performance in extracting relations from two entities mentioned in sentences, given sufficient annotations available during training. Such annotations would be labor-intensive to obtain in practice. Existing work adopts data augmentation techniques to generate pseudo-annotated sentences beyond limited annotations. These techniques neither preserve the semantic consistency of the original sentences when rule-based augmentations are adopted, nor preserve the syntax structure of sentences when expressing relations using seq2seq models, resulting in less diverse augmentations. In this work, we propose a dedicated augmentation technique for relational texts, named GDA, which uses two complementary modules to preserve both semantic consistency and syntax structures. We adopt a generative formulation and design a multi-tasking solution to achieve synergies. Furthermore, GDA adopts entity hints as the prior knowledge of the generative model to augment diverse sentences. Experimental results in three datasets under a low-resource setting showed that GDA could bring {\em 2.0\%} F1 improvements compared with no augmentation technique. Source code and data are available.
翻译:关系抽取(RE)任务在训练过程中获得充足标注的情况下,能从句子中提及的两个实体间抽取关系,展现出良好的性能。然而实践中获取此类标注需要耗费大量人力。现有工作采用数据增强技术生成超出有限标注范围的伪标注句子,但这类技术要么在采用基于规则的增强时无法保持原始句子的语义一致性,要么在使用seq2seq模型表达关系时破坏了句子的句法结构,导致增强结果的多样性不足。为此,本文提出一种面向关系文本的专用增强技术GDA,该技术通过两个互补模块同时保持语义一致性与句法结构。我们采用生成式建模,并设计多任务求解方案实现协同效应。此外,GDA将实体提示作为生成模型的先验知识,以增强句子的多样性。在低资源场景下三个数据集上的实验结果表明,与无增强技术相比,GDA在F1值上可提升2.0%。源代码与数据已公开。