Comprehensive Evaluation and Insights into the Use of Deep Neural Networks to Detect and Quantify Lymphoma Lesions in PET/CT Images

Shadab Ahamed,Yixi Xu,Claire Gowdy,Joo H. O,Ingrid Bloise,Don Wilson,Patrick Martineau,François Bénard,Fereshteh Yousefirizi,Rahul Dodhia,Juan M. Lavista,William B. Weeks,Carlos F. Uribe,Arman Rahmim

from arxiv, 12 pages, 10 figures, 2 tables

This study performs comprehensive evaluation of four neural network architectures (UNet, SegResNet, DynUNet, and SwinUNETR) for lymphoma lesion segmentation from PET/CT images. These networks were trained, validated, and tested on a diverse, multi-institutional dataset of 611 cases. Internal testing (88 cases; total metabolic tumor volume (TMTV) range [0.52, 2300] ml) showed SegResNet as the top performer with a median Dice similarity coefficient (DSC) of 0.76 and median false positive volume (FPV) of 4.55 ml; all networks had a median false negative volume (FNV) of 0 ml. On the unseen external test set (145 cases with TMTV range: [0.10, 2480] ml), SegResNet achieved the best median DSC of 0.68 and FPV of 21.46 ml, while UNet had the best FNV of 0.41 ml. We assessed reproducibility of six lesion measures, calculated their prediction errors, and examined DSC performance in relation to these lesion measures, offering insights into segmentation accuracy and clinical relevance. Additionally, we introduced three lesion detection criteria, addressing the clinical need for identifying lesions, counting them, and segmenting based on metabolic characteristics. We also performed expert intra-observer variability analysis revealing the challenges in segmenting ``easy'' vs. ``hard'' cases, to assist in the development of more resilient segmentation algorithms. Finally, we performed inter-observer agreement assessment underscoring the importance of a standardized ground truth segmentation protocol involving multiple expert annotators. Code is available at: https://github.com/microsoft/lymphoma-segmentation-dnn

翻译：本研究对四种神经网络架构（UNet、SegResNet、DynUNet和SwinUNETR）在PET/CT图像中分割淋巴瘤病变的性能进行了综合评估。这些网络在一个包含611例病例的多样化、多机构数据集上进行了训练、验证和测试。内部测试（88例；总代谢肿瘤体积（TMTV）范围[0.52, 2300]毫升）显示，SegResNet表现最佳，其中位Dice相似系数（DSC）为0.76，中位假阳性体积（FPV）为4.55毫升；所有网络的中位假阴性体积（FNV）均为0毫升。在未见过的外部测试集（145例，TMTV范围[0.10, 2480]毫升）上，SegResNet取得了最佳的中位DSC（0.68）和FPV（21.46毫升），而UNet的最佳FNV为0.41毫升。我们评估了六种病变测量指标的复现性，计算了其预测误差，并研究了DSC性能与这些病变测量指标的关系，从而为分割准确性和临床相关性提供见解。此外，我们引入了三种病变检测标准，以满足临床中识别病变、计数病变以及基于代谢特征进行分割的需求。我们还进行了专家内观察者变异性分析，揭示了分割“简单”与“困难”病例所面临的挑战，以助于开发更具鲁棒性的分割算法。最后，我们进行了观察者间一致性评估，强调了涉及多位专家标注者的标准化真实标注分割协议的重要性。代码地址：https://github.com/microsoft/lymphoma-segmentation-dnn