Training Artificial Intelligence (AI) models on three-dimensional image data presents unique challenges compared to the two-dimensional case: Firstly, the computational resources are significantly higher, and secondly, the availability of large pretraining datasets is often limited, impeding training success. In this study, we propose a simple approach of adapting 2D networks with an intermediate feature representation for processing 3D volumes. Our method involves sequentially applying these networks to slices of a 3D volume from all orientations. Subsequently, a feature reduction module combines the extracted slice features into a single representation, which is then used for classification. We evaluate our approach on medical classification benchmarks and a real-world clinical dataset, demonstrating comparable results to existing methods. Furthermore, by employing attention pooling as a feature reduction module we obtain weighted importance values for each slice during the forward pass. We show that slices deemed important by our approach allow the inspection of the basis of a model's prediction.
翻译:在三维图像数据上训练人工智能(AI)模型相比二维情况面临独特挑战:首先,计算资源需求显著更高;其次,大规模预训练数据集的可用性往往有限,阻碍了训练成功。在本研究中,我们提出了一种简单的方法,即通过中间特征表示来适配二维网络以处理三维体数据。我们的方法包括从所有方向依次将这些网络应用于三维体数据的切片。随后,一个特征缩减模块将提取的切片特征组合为单一表示,并用于分类。我们在医学分类基准和一个真实世界临床数据集上评估了该方法,展示了与现有方法相当的结果。此外,通过采用注意力池化作为特征缩减模块,我们在前向传播过程中为每个切片获得了加权重要性值。我们表明,被我们的方法判定为重要的切片能够揭示模型预测依据的基础。