Medical ultrasound (US) imaging has become a prominent modality for breast cancer imaging due to its ease-of-use, low-cost and safety. In the past decade, convolutional neural networks (CNNs) have emerged as the method of choice in vision applications and have shown excellent potential in automatic classification of US images. Despite their success, their restricted local receptive field limits their ability to learn global context information. Recently, Vision Transformer (ViT) designs that are based on self-attention between image patches have shown great potential to be an alternative to CNNs. In this study, for the first time, we utilize ViT to classify breast US images using different augmentation strategies. The results are provided as classification accuracy and Area Under the Curve (AUC) metrics, and the performance is compared with the state-of-the-art CNNs. The results indicate that the ViT models have comparable efficiency with or even better than the CNNs in classification of US breast images.
翻译:医学超声成像因其易用性、低成本和高安全性,已成为乳腺癌成像的重要模态。在过去十年中,卷积神经网络已成为视觉应用领域的首选方法,并在超声图像的自动分类中展现出卓越潜力。尽管其成果显著,但其受限的局部感受野限制了其学习全局上下文信息的能力。近年来,基于图像块间自注意力机制的视觉Transformer设计展现出成为CNN替代方案的巨大潜力。本研究首次利用ViT对乳腺超声图像进行分类,并采用了不同的数据增强策略。结果以分类准确率和曲线下面积指标呈现,并与最先进的CNN模型进行了性能比较。结果表明,在乳腺超声图像分类任务中,ViT模型具有与CNN相当甚至更优的效能。