The Sequential Sentence Classification task within the domain of medical abstracts, termed as SSC, involves the categorization of sentences into pre-defined headings based on their roles in conveying critical information in the abstract. In the SSC task, sentences are often sequentially related to each other. For this reason, the role of sentence embedding is crucial for capturing both the semantic information between words in the sentence and the contextual relationship of sentences within the abstract to provide a comprehensive representation for better classification. In this paper, we present a hierarchical deep learning model for the SSC task. First, we propose a LSTM-based network with multiple feature branches to create well-presented sentence embeddings at the sentence level. To perform the sequence of sentences, a convolutional-recurrent neural network (C-RNN) at the abstract level and a multi-layer perception network (MLP) at the segment level are developed that further enhance the model performance. Additionally, an ablation study is also conducted to evaluate the contribution of individual component in the entire network to the model performance at different levels. Our proposed system is very competitive to the state-of-the-art systems and further improve F1 scores of the baseline by 1.0%, 2.8%, and 2.6% on the benchmark datasets PudMed 200K RCT, PudMed 20K RCT and NICTA-PIBOSO, respectively.
翻译:序列句子分类任务在医学摘要领域被称为SSC,该任务涉及根据句子在摘要中传递关键信息的作用,将其分类为预定义标题。在SSC任务中,句子之间通常存在序列相关性。因此,句子嵌入对于捕捉句子内词语间的语义信息以及摘要中句子的上下文关系至关重要,从而为更好的分类提供全面表示。本文提出了一种用于SSC任务的分层深度学习模型。首先,我们设计了一个基于LSTM的多特征分支网络,在句子层面生成高质量的句子嵌入。为处理句子序列,我们在摘要层面开发了卷积递归神经网络(C-RNN),在段落层面开发了多层感知网络(MLP),进一步提升了模型性能。此外,我们还进行了消融研究,以评估整个网络中各个组件在不同层面对模型性能的贡献。我们提出的系统与当前最先进系统相比极具竞争力,并在基准数据集PudMed 200K RCT、PudMed 20K RCT和NICTA-PIBOSO上分别将基线的F1分数提高了1.0%、2.8%和2.6%。