Biomedical image classification requires capturing of bio-informatics based on specific feature distribution. In most of such applications, there are mainly challenges due to limited availability of samples for diseased cases and imbalanced nature of dataset. This article presents the novel framework of multi-head self-attention for vision transformer (ViT) which makes capable of capturing the specific image features for classification and analysis. The proposed method uses the concept of residual connection for accumulating the best attention output in each block of multi-head attention. The proposed framework has been evaluated on two small datasets: (i) blood cell classification dataset and (ii) brain tumor detection using brain MRI images. The results show the significant improvement over traditional ViT and other convolution based state-of-the-art classification models.
翻译:生物医学图像分类需要基于特定特征分布捕获生物信息。在此类应用中,主要面临两大挑战:病变样本可用性有限以及数据集的不平衡性。本文提出了一种用于视觉Transformer(ViT)的多头自注意力新框架,能够有效捕获用于分类和分析的特定图像特征。该方法引入残差连接概念,在每个多头注意力块中累积最优注意力输出。所提框架在两个小型数据集上进行了评估:(i)血细胞分类数据集;(ii)基于脑部MRI图像的脑肿瘤检测数据集。实验结果表明,该方法相较于传统ViT及其他基于卷积的先进分类模型,性能提升显著。