BB-GCN: A Bi-modal Bridged Graph Convolutional Network for Multi-label Chest X-Ray Recognition

Multi-label chest X-ray (CXR) recognition involves simultaneously diagnosing and identifying multiple labels for different pathologies. Since pathological labels have rich information about their relationship to each other, modeling the co-occurrence dependencies between pathological labels is essential to improve recognition performance. However, previous methods rely on state variable coding and attention mechanisms-oriented to model local label information, and lack learning of global co-occurrence relationships between labels. Furthermore, these methods roughly integrate image features and label embedding, ignoring the alignment and compactness problems in cross-modal vector fusion.To solve these problems, a Bi-modal Bridged Graph Convolutional Network (BB-GCN) model is proposed. This model mainly consists of a backbone module, a pathology Label Co-occurrence relationship Embedding (LCE) module, and a Transformer Bridge Graph (TBG) module. Specifically, the backbone module obtains image visual feature representation. The LCE module utilizes a graph to model the global co-occurrence relationship between multiple labels and employs graph convolutional networks for learning inference. The TBG module bridges the cross-modal vectors more compactly and efficiently through the GroupSum method.We have evaluated the effectiveness of the proposed BB-GCN in two large-scale CXR datasets (ChestX-Ray14 and CheXpert). Our model achieved state-of-the-art performance: the mean AUC scores for the 14 pathologies were 0.835 and 0.813, respectively.The proposed LCE and TBG modules can jointly effectively improve the recognition performance of BB-GCN. Our model also achieves satisfactory results in multi-label chest X-ray recognition and exhibits highly competitive generalization performance.

翻译：多标签胸部X光（CXR）识别涉及同时诊断和鉴定多种病理的不同标签。由于病理标签包含丰富的相互关联信息，建模病理标签之间的共现依赖关系对于提升识别性能至关重要。然而，先前的方法依赖于面向建模局部标签信息的状态变量编码和注意力机制，缺乏对标签间全局共现关系的学习。此外，这些方法粗略地将图像特征与标签嵌入进行融合，忽略了跨模态向量融合中的对齐与紧凑性问题。为解决这些问题，提出了一种双模态桥接图卷积网络（BB-GCN）模型。该模型主要由主干模块、病理标签共现关系嵌入（LCE）模块和Transformer桥接图（TBG）模块组成。具体而言，主干模块获取图像视觉特征表示；LCE模块利用图建模多个标签之间的全局共现关系，并通过图卷积网络进行学习推理；TBG模块通过GroupSum方法更紧凑、高效地桥接跨模态向量。我们在两个大规模CXR数据集（ChestX-Ray14和CheXpert）上评估了所提BB-GCN的有效性。模型取得了最优性能：14种病理的平均AUC得分分别为0.835和0.813。所提出的LCE和TBG模块能够共同有效提升BB-GCN的识别性能。我们的模型在多标签胸部X光识别中亦取得了令人满意的结果，并展现出极具竞争力的泛化性能。