Chest X-ray images are commonly used in medical diagnosis, and AI models have been developed to assist with the interpretation of these images. However, many of these models rely on information from a single view of the X-ray, while multiple views may be available. In this work, we propose a novel approach for combining information from multiple views to improve the performance of X-ray image classification. Our approach is based on the use of a convolutional neural network to extract feature maps from each view, followed by an attention mechanism implemented using a Vision Transformer. The resulting model is able to perform multi-label classification on 41 labels and outperforms both single-view models and traditional multi-view classification architectures. We demonstrate the effectiveness of our approach through experiments on a dataset of 363,000 X-ray images.
翻译:胸部X光片常用于医学诊断,目前已开发出多种AI模型辅助解读此类影像。然而,现有模型大多依赖单一视图的X光信息,而多视图影像往往可供使用。本研究提出一种融合多视图信息的新方法,旨在提升X光图像分类性能。该方法采用卷积神经网络从每个视图中提取特征图,并通过基于Vision Transformer实现的注意力机制进行特征融合。最终模型能够对41个标签进行多标签分类,其性能超越单视图模型及传统多视图分类架构。通过在包含363,000张X光图像的数据集上开展实验,我们验证了该方法的有效性。