Hospital discharge documentation is among the most essential, yet time-consuming documents written by medical practitioners. The objective of this study was to automatically generate hospital discharge summaries using neural network summarization models. We studied various data preparation and neural network training techniques that generate discharge summaries. Using nursing notes and discharge summaries from the MIMIC-III dataset, we studied the viability of the automatic generation of various sections of a discharge summary using four state-of-the-art neural network summarization models (BART, T5, Longformer and FLAN-T5). Our experiments indicated that training environments including nursing notes as the source, and discrete sections of the discharge summary as the target output (e.g. "History of Present Illness") improve language model efficiency and text quality. According to our findings, the fine-tuned BART model improved its ROUGE F1 score by 43.6% against its standard off-the-shelf version. We also found that fine-tuning the baseline BART model with other setups caused different degrees of improvement (up to 80% relative improvement). We also observed that a fine-tuned T5 generally achieves higher ROUGE F1 scores than other fine-tuned models and a fine-tuned FLAN-T5 achieves the highest ROUGE score overall, i.e., 45.6. For majority of the fine-tuned language models, summarizing discharge summary report sections separately outperformed the summarization the entire report quantitatively. On the other hand, fine-tuning language models that were previously instruction fine-tuned showed better performance in summarizing entire reports. This study concludes that a focused dataset designed for the automatic generation of discharge summaries by a language model can produce coherent Discharge Summary sections.
翻译:医院出院文档是医疗从业者撰写的既至关重要又耗时最长的文件之一。本研究旨在利用神经网络摘要模型自动生成出院小结。我们研究了多种数据准备方法和神经网络训练技术,用于生成出院小结。基于MIMIC-III数据集中的护理记录与出院小结,我们探索了使用四种前沿神经网络摘要模型(BART、T5、Longformer和FLAN-T5)自动生成出院小结各部分的可行性。实验表明,以护理记录为源文本、出院小结离散片段(如“现病史”)为目标输出的训练环境,能提升语言模型效率与文本质量。研究发现,微调后的BART模型较其标准现成版本的ROUGE F1得分提高了43.6%。此外,采用其他设置微调基础BART模型可带来不同程度的性能提升(最高相对提升达80%)。我们还观察到,微调后的T5模型通常比其它微调模型获得更高的ROUGE F1得分,而微调后的FLAN-T5取得了最高总体ROUGE分数(45.6)。对于大多数微调语言模型而言,分别总结出院小结各部分的定量效果优于对整个报告进行摘要。另一方面,经指令微调的语言模型在完整报告摘要任务中表现更优。本研究结论表明,为语言模型自动生成出院小结而设计的专项数据集,能够生成连贯的出院小结各章节内容。