Information extraction(IE) is a crucial subfield within natural language processing. However, for the traditionally segmented approach to sentence classification and Named Entity Recognition, the intricate interactions between these individual subtasks remain largely uninvestigated. In this study, we propose an integrative analysis, converging sentence classification with Named Entity Recognition, with the objective to unveil and comprehend the mutual reinforcement effect within these two information extraction subtasks. To achieve this, we introduce a Sentence Classification and Named Entity Recognition Multi-task (SCNM) approach that combines Sentence Classification (SC) and Named Entity Recognition (NER). We develop a Sentence-to-Label Generation (SLG) framework for SCNM and construct a Wikipedia dataset containing both SC and NER. Using a format converter, we unify input formats and employ a generative model to generate SC-labels, NER-labels, and associated text segments. We propose a Constraint Mechanism (CM) to improve generated format accuracy. Our results show SC accuracy increased by 1.13 points and NER by 1.06 points in SCNM compared to standalone tasks, with CM raising format accuracy from 63.61 to 100. The findings indicate mutual reinforcement effects between SC and NER, and integration enhances both tasks' performance. We additionally implemented the SLG framework on single SC task. It yielded superior accuracies compared to the baseline on two distinct Japanese SC datasets. Notably, in the experiment of few-shot learning, SLG framework shows much better performance than fine-tune method. These empirical findings contribute additional evidence to affirm the efficacy of the SLG framework.
翻译:信息抽取(IE)是自然语言处理中的一个重要子领域。然而,对于传统上将句子分类与命名实体识别分开处理的方法而言,这些独立子任务之间的复杂交互作用在很大程度上仍未得到探究。在本研究中,我们提出了一种将句子分类与命名实体识别相结合的综合分析方法,旨在揭示并理解这两个信息抽取子任务间的相互增强效应。为实现这一目标,我们引入了一种句子分类与命名实体识别多任务(SCNM)方法,该方法结合了句子分类(SC)与命名实体识别(NER)。我们为SCNM开发了一个句子到标签生成(SLG)框架,并构建了一个同时包含SC和NER标注的维基百科数据集。通过使用格式转换器统一输入格式,并采用生成式模型生成SC标签、NER标签及关联文本片段。我们提出了一种约束机制(CM)以提升生成格式的准确性。结果显示,与单独任务相比,SCNM中SC准确率提升了1.13个百分点,NER提升了1.06个百分点,且CM将格式准确率从63.61提升至100。研究结果表明SC与NER之间存在相互增强效应,整合可提升两项任务的性能。我们还将SLG框架单独应用于SC任务。在两个不同的日语句子分类数据集上,其准确率均优于基线方法。值得注意的是,在小样本学习实验中,SLG框架的表现显著优于微调方法。这些实证研究进一步证实了SLG框架的有效性。