Capturing global contextual information plays a critical role in breast ultrasound (BUS) image classification. Although convolutional neural networks (CNNs) have demonstrated reliable performance in tumor classification, they have inherent limitations for modeling global and long-range dependencies due to the localized nature of convolution operations. Vision Transformers have an improved capability of capturing global contextual information but may distort the local image patterns due to the tokenization operations. In this study, we proposed a hybrid multitask deep neural network called Hybrid-MT-ESTAN, designed to perform BUS tumor classification and segmentation using a hybrid architecture composed of CNNs and Swin Transformer components. The proposed approach was compared to nine BUS classification methods and evaluated using seven quantitative metrics on a dataset of 3,320 BUS images. The results indicate that Hybrid-MT-ESTAN achieved the highest accuracy, sensitivity, and F1 score of 82.7%, 86.4%, and 86.0%, respectively.
翻译:捕捉全局上下文信息在乳腺超声图像分类中起着关键作用。尽管卷积神经网络已在肿瘤分类中展现出可靠性能,但由于卷积操作本身的局部性特征,其在建模全局和长程依赖关系方面存在固有局限性。视觉Transformer虽能有效捕获全局上下文信息,但其词元化操作可能导致局部图像特征失真。本研究提出一种名为Hybrid-MT-ESTAN的混合多任务深度神经网络,该网络采用CNN与Swin Transformer组件的混合架构,专门用于乳腺超声肿瘤分类与分割任务。通过与九种主流乳腺超声分类方法的对比实验,基于3,320张乳腺超声图像数据集,采用七项定量指标进行评估,结果表明Hybrid-MT-ESTAN在准确率、灵敏度和F1评分上分别达到82.7%、86.4%和86.0%的最高性能。