Large medical imaging data sets are becoming increasingly available, but ensuring sample quality without significant artefacts is challenging. Existing methods for identifying imperfections in medical imaging rely on data-intensive approaches, compounded by a scarcity of artefact-rich scans for training machine learning models in clinical research. To tackle this problem, we propose a framework with four main components: 1) artefact generators inspired by magnetic resonance physics to corrupt brain MRI scans and augment a training dataset, 2) abstract and engineered features to represent images compactly, 3) a feature selection process depending on the artefact class to improve classification, and 4) SVM classifiers to identify artefacts. Our contributions are threefold: first, physics-based artefact generators produce synthetic brain MRI scans with controlled artefacts for data augmentation. This will avoid the labour-intensive collection and labelling process of scans with rare artefacts. Second, we propose a pool of abstract and engineered image features to identify 9 different artefacts for structural MRI. Finally, we use an artefact-based feature selection block that, for each class of artefacts, finds the set of features providing the best classification performance. We performed validation experiments on a large data set of scans with artificially-generated artefacts, and in a multiple sclerosis clinical trial where real artefacts were identified by experts, showing that the proposed pipeline outperforms traditional methods. In particular, our data augmentation increases performance by up to 12.5 percentage points on accuracy, precision, and recall. The computational efficiency of our pipeline enables potential real-time deployment, promising high-throughput clinical applications through automated image-processing pipelines driven by quality control systems.
翻译:大型医学影像数据集正日益普及,但确保样本质量以避免显著伪影仍具挑战性。现有医学影像缺陷识别方法依赖数据密集型技术,且临床研究中用于训练机器学习模型的含伪影扫描数据稀缺。针对此问题,我们提出包含四个核心组件的框架:1)基于磁共振物理原理的伪影生成器,用于污染脑部MRI扫描数据并扩充训练集;2)抽象与工程化特征组合,实现图像的紧凑表示;3)基于伪影类别动态选择的特征筛选流程,提升分类性能;4)支持向量机分类器用于伪影识别。本文贡献包括三方面:首先,物理驱动的伪影生成器可生成含可控伪影的合成脑部MRI扫描数据用于数据增强,避免人工标注稀有伪影的高成本流程;其次,我们构建包含抽象特征与工程化特征的复合特征池,可识别结构MRI中9类不同伪影;最终,创新性地引入基于伪影类别的特征选择模块,针对每类伪影自动筛选最优分类性能的特征子集。通过大规模人工合成伪影扫描数据集验证实验,以及在专家标注真实伪影的多发性硬化症临床试验中,本框架均展现出优于传统方法的性能。特别地,数据增强方法使准确率、精确率及召回率最高提升12.5个百分点。该框架的计算效率支持潜在实时部署,通过质量控制系统驱动的自动化图像处理流程,可推动高通量临床应用。