Patients with low health literacy usually have difficulty understanding medical jargon and the complex structure of professional medical language. Although some studies are proposed to automatically translate expert language into layperson-understandable language, only a few of them focus on both accuracy and readability aspects simultaneously in the clinical domain. Thus, simplification of the clinical language is still a challenging task, but unfortunately, it is not yet fully addressed in previous work. To benchmark this task, we construct a new dataset named MedLane to support the development and evaluation of automated clinical language simplification approaches. Besides, we propose a new model called DECLARE that follows the human annotation procedure and achieves state-of-the-art performance compared with eight strong baselines. To fairly evaluate the performance, we also propose three specific evaluation metrics. Experimental results demonstrate the utility of the annotated MedLane dataset and the effectiveness of the proposed model DECLARE.
翻译:健康素养较低的患者通常难以理解医学术语及专业医学语言的复杂结构。尽管已有研究尝试将专业语言自动转化为患者可理解的语言,但仅有少数研究同时关注临床领域的准确性与可读性。因此,临床语言简化仍是一项具有挑战性的任务,遗憾的是,先前研究尚未充分解决这一问题。为建立该任务的基准,我们构建了名为MedLane的新数据集,以支持自动化临床语言简化方法的开发与评估。此外,我们提出了名为DECLARE的新模型,该模型遵循人工标注流程,在八个强基线模型中取得了最优性能。为公平评估性能,我们还提出了三项专用评估指标。实验结果表明,所标注的MedLane数据集具有实用价值,所提出的DECLARE模型也具备有效性。