Due to the semantic complexity of the Relation extraction (RE) task, obtaining high-quality human labelled data is an expensive and noisy process. To improve the sample efficiency of the models, semi-supervised learning (SSL) methods aim to leverage unlabelled data in addition to learning from limited labelled data points. Recently, strong data augmentation combined with consistency-based semi-supervised learning methods have advanced the state of the art in several SSL tasks. However, adapting these methods to the RE task has been challenging due to the difficulty of data augmentation for RE. In this work, we leverage the recent advances in controlled text generation to perform high quality data augmentation for the RE task. We further introduce small but significant changes to model architecture that allows for generation of more training data by interpolating different data points in their latent space. These data augmentations along with consistency training result in very competitive results for semi-supervised relation extraction on four benchmark datasets.
翻译:由于关系抽取(RE)任务具有语义复杂性,获取高质量人工标注数据既昂贵且易引入噪声。为提升模型的样本利用效率,半监督学习(SSL)方法旨在利用无标注数据,同时从有限的标注数据点中学习。近年来,强数据增强结合基于一致性的半监督学习方法在多项SSL任务中取得了领先水平。然而,由于关系抽取任务的数据增强存在困难,将这些方法适配到RE任务颇具挑战性。本研究利用受控文本生成的最新进展,为RE任务实现高质量数据增强。我们进一步对模型架构进行了微小但关键的改进,使其能够通过在潜在空间中对不同数据点进行插值来生成更多训练数据。这些数据增强策略与一致性训练相结合,在四个基准数据集上取得了极具竞争力的半监督关系抽取结果。