The Sequential Sentence Classification task within the domain of medical abstracts, termed as SSC, involves the categorization of sentences into pre-defined headings based on their roles in conveying critical information in the abstract. In the SSC task, sentences are sequentially related to each other. For this reason, the role of sentence embeddings is crucial for capturing both the semantic information between words in the sentence and the contextual relationship of sentences within the abstract, which then enhances the SSC system performance. In this paper, we propose a LSTM-based deep learning network with a focus on creating comprehensive sentence representation at the sentence level. To demonstrate the efficacy of the created sentence representation, a system utilizing these sentence embeddings is also developed, which consists of a Convolutional-Recurrent neural network (C-RNN) at the abstract level and a multi-layer perception network (MLP) at the segment level. Our proposed system yields highly competitive results compared to state-of-the-art systems and further enhances the F1 scores of the baseline by 1.0%, 2.8%, and 2.6% on the benchmark datasets PudMed 200K RCT, PudMed 20K RCT and NICTA-PIBOSO, respectively. This indicates the significant impact of improving sentence representation on boosting model performance.
翻译:医学摘要领域的序列句子分类任务(SSC)旨在依据句子在摘要中传递关键信息的功能,将其归类至预定义的标题类别。在SSC任务中,句子间存在序列关联性。因此,句子嵌入的作用至关重要,它既能捕捉句内词语间的语义信息,又能捕获摘要中句子间的上下文关系,从而提升SSC系统的性能。本文提出一种基于LSTM的深度学习网络,其核心在于句子层面构建全面的句子表征。为验证所构建句子表征的有效性,我们进一步开发了利用这些句子嵌入的系统,该系统包含摘要层级的卷积-循环神经网络(C-RNN)及片段层级的多层感知器网络(MLP)。在基准数据集PubMed 200K RCT、PubMed 20K RCT和NICTA-PIBOSO上,我们提出的系统相较于现有最优系统取得了极具竞争力的结果,并将基线模型的F1分数分别提升了1.0%、2.8%和2.6%。这表明改进句子表征对提升模型性能具有显著影响。