Automation of medical image interpretation could alleviate bottlenecks in diagnostic workflows, and has become of particular interest in recent years due to advancements in natural language processing. Great strides have been made towards automated radiology report generation via AI, yet ensuring clinical accuracy in generated reports is a significant challenge, hindering deployment of such methods in clinical practice. In this work we propose a quality control framework for assessing the reliability of AI-generated radiology reports with respect to semantics of diagnostic importance using modular auxiliary auditing components (AC). Evaluating our pipeline on the MIMIC-CXR dataset, our findings show that incorporating ACs in the form of disease-classifiers can enable auditing that identifies more reliable reports, resulting in higher F1 scores compared to unfiltered generated reports. Additionally, leveraging the confidence of the AC labels further improves the audit's effectiveness.
翻译:医学影像解读的自动化可缓解诊断工作流程中的瓶颈,近年来由于自然语言处理技术的进步,该领域受到特别关注。通过人工智能实现自动化放射学报告生成已取得重大进展,但确保生成报告的临床准确性仍是一个重大挑战,阻碍了此类方法在临床实践中的部署。本研究提出一种质量控制框架,利用模块化辅助审计组件评估人工智能生成的放射学报告在诊断重要性语义层面的可靠性。通过在MIMIC-CXR数据集上评估我们的流程,研究结果表明:以疾病分类器形式引入的辅助审计组件能够通过审计识别出更可靠的报告,与未经过滤的生成报告相比获得了更高的F1分数。此外,利用辅助审计组件标签的置信度可进一步提升审计效能。