WristMIR: Coarse-to-Fine Region-Aware Retrieval of Pediatric Wrist Radiographs with Radiology Report-Driven Learning

Retrieving wrist radiographs with analogous fracture patterns is challenging because clinically important cues are subtle, highly localized and often obscured by overlapping anatomy or variable imaging views. Progress is further limited by the scarcity of large, well-annotated datasets for case-based medical image retrieval. We introduce WristMIR, a region-aware pediatric wrist radiograph retrieval framework that leverages dense radiology reports and bone-specific localization to learn fine-grained, clinically meaningful image representations without any manual image-level annotations. Using MedGemma-based structured report mining to generate both global and region-level captions, together with pre-processed wrist images and bone-specific crops of the distal radius, distal ulna, and ulnar styloid, WristMIR jointly trains global and local contrastive encoders and performs a two-stage retrieval process: (1) coarse global matching to identify candidate exams, followed by (2) region-conditioned reranking aligned to a predefined anatomical bone region. WristMIR improves retrieval performance over strong vision-language baselines, raising image-to-text Recall@5 from 0.82% to 9.35%. Its embeddings also yield stronger fracture classification (AUROC 0.949, AUPRC 0.953). In region-aware evaluation, the two-stage design markedly improves retrieval-based fracture diagnosis, increasing mean $F_1$ from 0.568 to 0.753, and radiologists rate its retrieved cases as more clinically relevant, with mean scores rising from 3.36 to 4.35. These findings highlight the potential of anatomically guided retrieval to enhance diagnostic reasoning and support clinical decision-making in pediatric musculoskeletal imaging. The source code is publicly available at https://github.com/quin-med-harvard-edu/WristMIR.

翻译：检索具有类似骨折模式的腕部X光片具有挑战性，因为临床上重要的线索往往细微、高度局部化，且常被重叠的解剖结构或变化的成像视角所掩盖。基于案例的医学图像检索领域的发展进一步受到大型、标注良好数据集稀缺的限制。我们提出了WristMIR，一种区域感知的儿童腕部X光片检索框架，它利用密集的放射学报告和骨骼特异性定位来学习细粒度、具有临床意义的图像表示，而无需任何手动图像级标注。通过使用基于MedGemma的结构化报告挖掘来生成全局和区域级描述，结合预处理的腕部图像以及桡骨远端、尺骨远端和尺骨茎突的骨骼特异性裁剪，WristMIR联合训练全局和局部对比编码器，并执行两阶段检索过程：(1) 粗粒度全局匹配以识别候选检查，随后进行(2) 与预定义解剖骨骼区域对齐的区域条件重排序。WristMIR在强大的视觉-语言基线模型基础上提升了检索性能，将图像到文本的Recall@5从0.82%提高到9.35%。其嵌入表示也产生了更强的骨折分类能力（AUROC 0.949，AUPRC 0.953）。在区域感知评估中，两阶段设计显著改善了基于检索的骨折诊断，平均$F_1$分数从0.568提升至0.753，并且放射科医生评价其检索的病例具有更高的临床相关性，平均评分从3.36上升至4.35。这些发现凸显了解剖学引导检索在增强儿童肌肉骨骼影像诊断推理和支持临床决策方面的潜力。源代码公开于 https://github.com/quin-med-harvard-edu/WristMIR。