Context-Aware Asymmetric Ensembling for Interpretable Retinopathy of Prematurity Screening via Active Query and Vascular Attention

Retinopathy of Prematurity (ROP) is among the major causes of preventable childhood blindness. Automated screening remains challenging, primarily due to limited data availability and the complex condition involving both structural staging and microvascular abnormalities. Current deep learning models depend heavily on large private datasets and passive multimodal fusion, which commonly fail to generalize on small, imbalanced public cohorts. We thus propose the Context-Aware Asymmetric Ensemble Model (CAA Ensemble) that simulates clinical reasoning through two specialized streams. First, the Multi-Scale Active Query Network (MS-AQNet) serves as a structure specialist, utilizing clinical contexts as dynamic query vectors to spatially control visual feature extraction for localization of the fibrovascular ridge. Secondly, VascuMIL encodes Vascular Topology Maps (VMAP) within a gated Multiple Instance Learning (MIL) network to precisely identify vascular tortuosity. A synergistic meta-learner ensembles these orthogonal signals to resolve diagnostic discordance across multiple objectives. Tested on a highly imbalanced cohort of 188 infants (6,004 images), the framework attained State-of-the-Art performance on two distinct clinical tasks: achieving a Macro F1-Score of 0.93 for Broad ROP staging and an AUC of 0.996 for Plus Disease detection. Crucially, the system features `Glass Box' transparency through counterfactual attention heatmaps and vascular threat maps, proving that clinical metadata dictates the model's visual search. Additionally, this study demonstrates that architectural inductive bias can serve as an effective bridge for the medical AI data gap.

翻译：早产儿视网膜病变（ROP）是导致儿童可避免性失明的主要原因之一。自动化筛查仍然面临挑战，主要源于数据可用性有限以及该病症同时涉及结构性分期和微血管异常的复杂性。当前的深度学习模型严重依赖大型私有数据集和被动多模态融合，这通常难以在小型、不平衡的公共队列上实现泛化。为此，我们提出了上下文感知非对称集成模型（CAA Ensemble），该模型通过两个专业化分支模拟临床推理过程。首先，多尺度主动查询网络（MS-AQNet）作为结构专家，利用临床上下文信息作为动态查询向量，在空间上控制视觉特征提取以定位纤维血管嵴。其次，VascuMIL 在门控多示例学习（MIL）网络中编码血管拓扑图（VMAP），以精确识别血管迂曲度。一个协同元学习器集成这些正交信号，以解决跨多个诊断目标的不一致性问题。在一个包含188名婴儿（6,004张图像）的高度不平衡队列上进行测试，该框架在两个不同的临床任务上均达到了最先进的性能：在广泛ROP分期任务上取得了0.93的宏观F1分数，在Plus病变检测任务上取得了0.996的AUC值。至关重要的是，该系统通过反事实注意力热图和血管威胁图实现了“玻璃盒”式的透明度，证明了临床元数据主导了模型的视觉搜索过程。此外，本研究还表明，架构归纳偏置可以作为弥合医疗人工智能数据鸿沟的有效桥梁。