TRUSWorthy: Toward Clinically Applicable Deep Learning for Confident Detection of Prostate Cancer in Micro-Ultrasound

from arxiv, accepted to IJCARS. This preprint has not undergone post-submission improvements or corrections. To access the Version of Record of this article, see the journal reference below

While deep learning methods have shown great promise in improving the effectiveness of prostate cancer (PCa) diagnosis by detecting suspicious lesions from trans-rectal ultrasound (TRUS), they must overcome multiple simultaneous challenges. There is high heterogeneity in tissue appearance, significant class imbalance in favor of benign examples, and scarcity in the number and quality of ground truth annotations available to train models. Failure to address even a single one of these problems can result in unacceptable clinical outcomes.We propose TRUSWorthy, a carefully designed, tuned, and integrated system for reliable PCa detection. Our pipeline integrates self-supervised learning, multiple-instance learning aggregation using transformers, random-undersampled boosting and ensembling: these address label scarcity, weak labels, class imbalance, and overconfidence, respectively. We train and rigorously evaluate our method using a large, multi-center dataset of micro-ultrasound data. Our method outperforms previous state-of-the-art deep learning methods in terms of accuracy and uncertainty calibration, with AUROC and balanced accuracy scores of 79.9% and 71.5%, respectively. On the top 20% of predictions with the highest confidence, we can achieve a balanced accuracy of up to 91%. The success of TRUSWorthy demonstrates the potential of integrated deep learning solutions to meet clinical needs in a highly challenging deployment setting, and is a significant step towards creating a trustworthy system for computer-assisted PCa diagnosis.

翻译：尽管深度学习方法通过从经直肠超声（TRUS）中检测可疑病灶，在提高前列腺癌（PCa）诊断效能方面展现出巨大潜力，但它们必须同时克服多重挑战。这包括组织外观的高度异质性、良性与恶性样本间显著的类别不平衡，以及可用于训练模型的真实标注在数量和质量上的稀缺性。未能解决其中任何一个问题都可能导致无法接受的临床结果。我们提出了TRUSWorthy，一个经过精心设计、调优和集成的系统，用于实现可靠的PCa检测。我们的流程整合了自监督学习、基于Transformer的多示例学习聚合、随机欠采样提升以及集成学习：这些方法分别应对了标注稀缺、弱标注、类别不平衡和模型过度自信的问题。我们使用一个大型、多中心的微超声数据集对方法进行了训练和严格评估。我们的方法在准确性和不确定性校准方面均优于先前最先进的深度学习方法，其AUROC和平衡准确率分别达到79.9%和71.5%。在置信度最高的前20%预测中，我们的平衡准确率可高达91%。TRUSWorthy的成功证明了集成深度学习解决方案在极具挑战性的部署场景中满足临床需求的潜力，是迈向构建可信赖的计算机辅助PCa诊断系统的重要一步。