The way we analyse clinical texts has undergone major changes over the last years. The introduction of language models such as BERT led to adaptations for the (bio)medical domain like PubMedBERT and ClinicalBERT. These models rely on large databases of archived medical documents. While performing well in terms of accuracy, both the lack of interpretability and limitations to transfer across languages limit their use in clinical setting. We introduce a novel light-weight graph-based embedding method specifically catering radiology reports. It takes into account the structure and composition of the report, while also connecting medical terms in the report through the multi-lingual SNOMED Clinical Terms knowledge base. The resulting graph embedding uncovers the underlying relationships among clinical terms, achieving a representation that is better understandable for clinicians and clinically more accurate, without reliance on large pre-training datasets. We show the use of this embedding on two tasks namely disease classification of X-ray reports and image classification. For disease classification our model is competitive with its BERT-based counterparts, while being magnitudes smaller in size and training data requirements. For image classification, we show the effectiveness of the graph embedding leveraging cross-modal knowledge transfer and show how this method is usable across different languages.
翻译:近年来,临床文本的分析方式发生了重大变革。BERT等语言模型的引入催生了针对生物医学领域的适配模型,如PubMedBERT和ClinicalBERT。这些模型依赖大规模的归档医学文档数据库。尽管在准确性方面表现良好,但可解释性不足以及跨语言迁移的限制影响了它们在临床环境中的应用。我们提出一种新型轻量级基于图的嵌入方法,专门针对放射报告设计。该方法既考虑了报告的结构与组成,又通过多语言SNOMED临床术语知识库连接报告中的医学术语。由此产生的图嵌入揭示了临床术语之间的潜在关系,实现了临床医生更易理解且临床准确性更高的表示,且无需依赖大规模预训练数据集。我们在两个任务上展示了该嵌入的应用:X光报告的疾病分类和图像分类。在疾病分类任务中,我们的模型与基于BERT的同类模型性能相当,但模型规模和训练数据需求小几个数量级。在图像分类任务中,我们展示了该图嵌入通过跨模态知识迁移的有效性,并说明了该方法如何在不同语言间适用。