Scientific Information Extraction (ScientificIE) is a critical task that involves the identification of scientific entities and their relationships. The complexity of this task is compounded by the necessity for domain-specific knowledge and the limited availability of annotated data. Two of the most popular datasets for ScientificIE are SemEval-2018 Task-7 and SciERC. They have overlapping samples and differ in their annotation schemes, which leads to conflicts. In this study, we first introduced a novel approach based on multi-task learning to address label variations. We then proposed a soft labeling technique that converts inconsistent labels into probabilistic distributions. The experimental results demonstrated that the proposed method can enhance the model robustness to label noise and improve the end-to-end performance in both ScientificIE tasks. The analysis revealed that label variations can be particularly effective in handling ambiguous instances. Furthermore, the richness of the information captured by label variations can potentially reduce data size requirements. The findings highlight the importance of releasing variation labels and promote future research on other tasks in other domains. Overall, this study demonstrates the effectiveness of multi-task learning and the potential of label variations to enhance the performance of ScientificIE.
翻译:科学信息抽取(ScientificIE)是一项关键任务,涉及识别科学实体及其关系。该任务复杂性因领域特定知识的需求和标注数据有限而加剧。ScientificIE中最常用的两个数据集是SemEval-2018 Task-7和SciERC,它们存在重叠样本但标注方案不同,从而引发冲突。本研究首先提出了一种基于多任务学习的新方法以解决标签变异问题,进而提出一种软标签技术,将不一致的标签转化为概率分布。实验结果表明,所提方法能增强模型对标签噪声的鲁棒性,并提升两个ScientificIE任务的端到端性能。分析揭示,标签变异对处理歧义实例尤为有效;此外,标签变异所捕获的丰富信息可能减少数据规模需求。研究结果强调了发布变异标签的重要性,并推动了其他领域任务中的未来研究。总体而言,本研究展示了多任务学习的有效性以及标签变异在提升ScientificIE性能方面的潜力。