To address the problem of medical image recognition, computer vision techniques like convolutional neural networks (CNN) are frequently used. Recently, 3D CNN-based models dominate the field of magnetic resonance image (MRI) analytics. Due to the high similarity between MRI data and videos, we conduct extensive empirical studies on video recognition techniques for MRI classification to answer the questions: (1) can we directly use video recognition models for MRI classification, (2) which model is more appropriate for MRI, (3) are the common tricks like data augmentation in video recognition still useful for MRI classification? Our work suggests that advanced video techniques benefit MRI classification. In this paper, four datasets of Alzheimer's and Parkinson's disease recognition are utilized in experiments, together with three alternative video recognition models and data augmentation techniques that are frequently applied to video tasks. In terms of efficiency, the results reveal that the video framework performs better than 3D-CNN models by 5% - 11% with 50% - 66% less trainable parameters. This report pushes forward the potential fusion of 3D medical imaging and video understanding research.
翻译:为解决医学图像识别问题,卷积神经网络(CNN)等计算机视觉技术被广泛应用。近年来,基于3D CNN的模型在磁共振图像分析领域占据主导地位。鉴于MRI数据与视频的高度相似性,我们针对视频识别技术在MRI分类中的适用性开展了大量实证研究,旨在回答以下问题:(1)能否直接使用视频识别模型进行MRI分类;(2)哪种模型更适合MRI分析;(3)视频识别中常用的数据增强等技巧是否仍适用于MRI分类?研究结果表明,先进视频技术有助于提升MRI分类性能。本文采用四个阿尔茨海默病与帕金森病识别数据集进行实验,并引入三种主流视频识别模型及视频任务中常用的数据增强技术。效率方面,视频框架相比3D-CNN模型性能提升5%-11%,同时可训练参数减少50%-66%。该研究推动了三维医学成像与视频理解领域的潜在融合。