The way we analyse clinical texts has undergone major changes over the last years. The introduction of language models such as BERT led to adaptations for the (bio)medical domain like PubMedBERT and ClinicalBERT. These models rely on large databases of archived medical documents. While performing well in terms of accuracy, both the lack of interpretability and limitations to transfer across languages limit their use in clinical setting. We introduce a novel light-weight graph-based embedding method specifically catering radiology reports. It takes into account the structure and composition of the report, while also connecting medical terms in the report through the multi-lingual SNOMED Clinical Terms knowledge base. The resulting graph embedding uncovers the underlying relationships among clinical terms, achieving a representation that is better understandable for clinicians and clinically more accurate, without reliance on large pre-training datasets. We show the use of this embedding on two tasks namely disease classification of X-ray reports and image classification. For disease classification our model is competitive with its BERT-based counterparts, while being magnitudes smaller in size and training data requirements. For image classification, we show the effectiveness of the graph embedding leveraging cross-modal knowledge transfer and show how this method is usable across different languages.
翻译:近年来,临床文本的分析方式发生了重大变革。BERT等语言模型的引入催生了面向(生物)医学领域的适应性模型,如PubMedBERT和ClinicalBERT。这些模型依赖大规模存档医学文档数据库。尽管它们在准确性方面表现优异,但可解释性不足以及跨语言迁移的局限性限制了其在临床环境中的应用。我们提出了一种新颖的轻量级基于图谱的嵌入方法,专门针对放射学报告设计。该方法考虑了报告的结构与组成,同时通过多语言SNOMED临床术语知识库连接报告中的医学术语。由此产生的图谱嵌入揭示了临床术语间的潜在关联,实现了临床医生更易理解且临床准确性更高的表示,且无需依赖大规模预训练数据集。我们展示了该嵌入在两项任务中的应用:X光报告疾病分类与图像分类。在疾病分类任务中,我们的模型与基于BERT的同类模型性能相当,但模型规模与训练数据需求大幅降低。在图像分类任务中,我们通过跨模态知识迁移验证了图谱嵌入的有效性,并展示了该方法在不同语言中的适用性。