Improving Image Classification of Knee Radiographs: An Automated Image Labeling Approach

Large numbers of radiographic images are available in knee radiology practices which could be used for training of deep learning models for diagnosis of knee abnormalities. However, those images do not typically contain readily available labels due to limitations of human annotations. The purpose of our study was to develop an automated labeling approach that improves the image classification model to distinguish normal knee images from those with abnormalities or prior arthroplasty. The automated labeler was trained on a small set of labeled data to automatically label a much larger set of unlabeled data, further improving the image classification performance for knee radiographic diagnosis. We developed our approach using 7,382 patients and validated it on a separate set of 637 patients. The final image classification model, trained using both manually labeled and pseudo-labeled data, had the higher weighted average AUC (WAUC: 0.903) value and higher AUC-ROC values among all classes (normal AUC-ROC: 0.894; abnormal AUC-ROC: 0.896, arthroplasty AUC-ROC: 0.990) compared to the baseline model (WAUC=0.857; normal AUC-ROC: 0.842; abnormal AUC-ROC: 0.848, arthroplasty AUC-ROC: 0.987), trained using only manually labeled data. DeLong tests show that the improvement is significant on normal (p-value<0.002) and abnormal (p-value<0.001) images. Our findings demonstrated that the proposed automated labeling approach significantly improves the performance of image classification for radiographic knee diagnosis, allowing for facilitating patient care and curation of large knee datasets.

翻译：在膝关节放射学实践中，大量X光片图像可用于训练诊断膝关节异常的深度学习模型。然而，由于人工标注的局限性，这些图像通常缺乏现成的标签。本研究旨在开发一种自动标注方法，以提升区分正常膝关节图像与异常或既往关节置换术图像的图像分类模型性能。该自动标注器通过少量标注数据训练，实现对大量未标注数据的自动标注，从而进一步改善膝关节放射诊断的图像分类性能。我们利用7,382名患者开发该方法，并在另一组637名患者上验证。最终图像分类模型使用手动标注数据和伪标注数据联合训练，与仅使用手动标注数据训练的基线模型（加权平均AUC WAUC=0.857；正常AUC-ROC: 0.842；异常AUC-ROC: 0.848；关节置换术AUC-ROC: 0.987）相比，所有类别的加权平均AUC（WAUC: 0.903）值和AUC-ROC值均更高（正常AUC-ROC: 0.894；异常AUC-ROC: 0.896；关节置换术AUC-ROC: 0.990）。DeLong检验表明，在正常图像（p值<0.002）和异常图像（p值<0.001）上的改进具有显著性。我们的研究结果表明，所提出的自动标注方法显著提升了膝关节放射诊断的图像分类性能，有助于促进患者护理和大型膝关节数据集的整理。