CB-HVTNet: A channel-boosted hybrid vision transformer network for lymphocyte assessment in histopathological images

Transformers, due to their ability to learn long range dependencies, have overcome the shortcomings of convolutional neural networks (CNNs) for global perspective learning. Therefore, they have gained the focus of researchers for several vision related tasks including medical diagnosis. However, their multi-head attention module only captures global level feature representations, which is insufficient for medical images. To address this issue, we propose a Channel Boosted Hybrid Vision Transformer (CB HVT) that uses transfer learning to generate boosted channels and employs both transformers and CNNs to analyse lymphocytes in histopathological images. The proposed CB HVT comprises five modules, including a channel generation module, channel exploitation module, channel merging module, region-aware module, and a detection and segmentation head, which work together to effectively identify lymphocytes. The channel generation module uses the idea of channel boosting through transfer learning to extract diverse channels from different auxiliary learners. In the CB HVT, these boosted channels are first concatenated and ranked using an attention mechanism in the channel exploitation module. A fusion block is then utilized in the channel merging module for a gradual and systematic merging of the diverse boosted channels to improve the network's learning representations. The CB HVT also employs a proposal network in its region aware module and a head to effectively identify objects, even in overlapping regions and with artifacts. We evaluated the proposed CB HVT on two publicly available datasets for lymphocyte assessment in histopathological images. The results show that CB HVT outperformed other state of the art detection models, and has good generalization ability, demonstrating its value as a tool for pathologists.

翻译：Transformer凭借其学习长程依赖关系的能力，克服了卷积神经网络（CNN）在全局视角学习方面的局限性，因此成为包括医学诊断在内的多个视觉相关任务的研究焦点。然而，其多头注意力模块仅能捕捉全局级别的特征表示，这对医学图像而言尚不充分。为解决这一问题，我们提出了一种通道增强混合视觉Transformer（CB-HVT），它利用迁移学习生成增强通道，并同时采用Transformer和CNN分析组织病理学图像中的淋巴细胞。所提出的CB-HVT包含五个模块：通道生成模块、通道利用模块、通道合并模块、区域感知模块以及检测与分割头，这些模块协同工作以有效识别淋巴细胞。通道生成模块通过迁移学习实现通道增强，从不同辅助学习器中提取多样化通道。在CB-HVT中，这些增强通道首先被串联，并在通道利用模块中通过注意力机制进行排序。随后，通道合并模块利用融合块对多样化增强通道进行渐进式系统合并，以提升网络的学习表示能力。此外，CB-HVT在其区域感知模块中采用提议网络，并配备检测头，即使在重叠区域或存在伪影的情况下也能有效识别目标。我们在两个公开数据集上评估了所提出的CB-HVT在组织病理学图像淋巴细胞评估中的性能。结果表明，CB-HVT优于其他先进的检测模型，并具有良好的泛化能力，证明了其作为病理学家工具的价值。