An efficient deep neural network to find small objects in large 3D images

Jungkyu Park,Jakub Chłędowski,Stanisław Jastrzębski,Jan Witowski,Yanqi Xu,Linda Du,Sushma Gaddam,Eric Kim,Alana Lewin,Ujas Parikh,Anastasia Plaunova,Sardius Chen,Alexandra Millet,James Park,Kristine Pysarenko,Shalin Patel,Julia Goldberg,Melanie Wegener,Linda Moy,Laura Heacock,Beatriu Reig,Krzysztof J. Geras

3D imaging enables accurate diagnosis by providing spatial information about organ anatomy. However, using 3D images to train AI models is computationally challenging because they consist of 10x or 100x more pixels than their 2D counterparts. To be trained with high-resolution 3D images, convolutional neural networks resort to downsampling them or projecting them to 2D. We propose an effective alternative, a neural network that enables efficient classification of full-resolution 3D medical images. Compared to off-the-shelf convolutional neural networks, our network, 3D Globally-Aware Multiple Instance Classifier (3D-GMIC), uses 77.98%-90.05% less GPU memory and 91.23%-96.02% less computation. While it is trained only with image-level labels, without segmentation labels, it explains its predictions by providing pixel-level saliency maps. On a dataset collected at NYU Langone Health, including 85,526 patients with full-field 2D mammography (FFDM), synthetic 2D mammography, and 3D mammography, 3D-GMIC achieves an AUC of 0.831 (95% CI: 0.769-0.887) in classifying breasts with malignant findings using 3D mammography. This is comparable to the performance of GMIC on FFDM (0.816, 95% CI: 0.737-0.878) and synthetic 2D (0.826, 95% CI: 0.754-0.884), which demonstrates that 3D-GMIC successfully classified large 3D images despite focusing computation on a smaller percentage of its input compared to GMIC. Therefore, 3D-GMIC identifies and utilizes extremely small regions of interest from 3D images consisting of hundreds of millions of pixels, dramatically reducing associated computational challenges. 3D-GMIC generalizes well to BCS-DBT, an external dataset from Duke University Hospital, achieving an AUC of 0.848 (95% CI: 0.798-0.896).

翻译：三维成像通过提供器官解剖结构的空间信息，能够实现精确诊断。然而，使用三维图像训练AI模型在计算上具有挑战性，因为其像素数量是二维图像的10倍甚至100倍。为了用高分辨率三维图像进行训练，卷积神经网络通常采用下采样或投影至二维的方法。我们提出了一种有效的替代方案——一种能够高效分类全分辨率三维医学图像的神经网络。与现成的卷积神经网络相比，我们提出的3D全局感知多实例分类器（3D-GMIC）可减少77.98%-90.05%的GPU内存消耗和91.23%-96.02%的计算量。尽管该网络仅使用图像级标签进行训练（无需分割标签），它仍能通过生成像素级显著性图来解释其预测结果。在纽约大学朗格尼健康中心收集的包含85,526名患者全视野数字乳腺X线摄影（FFDM）、合成二维乳腺X线摄影和三维乳腺X线摄影的数据集上，3D-GMIC使用三维乳腺X线摄影对恶性病变乳腺进行分类时实现了0.831的AUC值（95%置信区间：0.769-0.887）。该性能与GMIC在FFDM（AUC=0.816，95%置信区间：0.737-0.878）和合成二维图像（AUC=0.826，95%置信区间：0.754-0.884）上的表现相当，表明3D-GMIC成功分类了大型三维图像，尽管其计算聚焦于输入中比GMIC更小的百分比区域。因此，3D-GMIC能够从包含数亿像素的三维图像中识别并利用极小的感兴趣区域，显著降低了相关计算挑战。该模型在杜克大学医院的外部数据集BCS-DBT上也展现出良好泛化能力，AUC达到0.848（95%置信区间：0.798-0.896）。