LUMINA: A Multi-Vendor Mammography Benchmark with Energy Harmonization Protocol

Hongyi Pan,Gorkem Durak,Halil Ertugrul Aktas,Andrea M. Bejar,Baver Tutun,Emre Uysal,Ezgi Bulbul,Mehmet Fatih Dogan,Berrin Erok,Berna Akkus Yildirim,Sukru Mehmet Erturk,Ulas Bagci

from arxiv, This paper was accepted to CVPR 2026

Publicly available full-field digital mammography (FFDM) datasets remain limited in size, clinical labels, and vendor diversity, which hinders the training of robust models. We present LUMINA, a curated, multi-vendor FFDM dataset that explicitly encodes acquisition energy and vendor metadata to expose clinically relevant appearance shifts that current benchmarks overlook. This innovative resource comprises 1824 images from 468 patients (960 benign, 864 malignant) with pathology-confirmed outcomes, BI-RADS assessments, and breast-density annotations. LUMINA spans six acquisition systems and both high- and low-energy styles, exposing vendor- and energy-driven appearance shifts. To reduce cross-vendor/energy drift while preserving lesion morphology, we introduce a foreground-only, pixel-space alignment (''energy harmonization'') that aligns each image to a low-energy reference style, leaving the zero-valued background unchanged. By benchmarking modern CNN and transformer baselines on three clinically meaningful tasks -- diagnosis (benign vs. malignant), BI-RADS risk grouping, and density -- we unify single-vs-two-view evaluation and show that two-view models consistently outperform single-view; in our benchmark, EfficientNet-B0 attains AUC 93.54% for diagnosis, and Swin-T yields the best macro-AUC 89.43% for density. Harmonization improves AUC/ACC across backbones and yields more focal Grad-CAM localization around suspicious regions. Being a richly annotated resource, LUMINA thus provides (a) a vendor-diverse, energy-labeled benchmark and (b) a model-agnostic harmonization protocol that together catalyze reliable, deployable mammography AI.

翻译：公开可用的全视野数字乳腺X线摄影（FFDM）数据集在规模、临床标签和厂商多样性方面仍然有限，这阻碍了稳健模型的训练。我们提出了LUMINA，一个经过精心策划的多厂商FFDM数据集，它明确编码了采集能量和厂商元数据，以揭示当前基准数据集忽略的临床相关外观偏移。这一创新资源包含来自468名患者（960例良性，864例恶性）的1824幅图像，所有病例均有病理证实结果、BI-RADS评估和乳腺密度标注。LUMINA涵盖了六种采集系统以及高能和低能两种成像风格，揭示了由厂商和能量驱动的外观偏移。为了在保留病灶形态的同时减少跨厂商/能量漂移，我们引入了一种仅针对前景的像素空间对齐方法（“能量标准化”），该方法将每幅图像对齐到低能参考风格，同时保持零值背景不变。通过在三个具有临床意义的任务——诊断（良性与恶性）、BI-RADS风险分组和密度评估——上对现代CNN和Transformer基线模型进行基准测试，我们统一了单视图与双视图评估，并证明双视图模型始终优于单视图模型；在我们的基准测试中，EfficientNet-B0在诊断任务上获得了93.54%的AUC，而Swin-T在密度评估任务上取得了最佳的宏观AUC 89.43%。能量标准化提高了不同主干网络的AUC/ACC，并在可疑区域周围产生了更聚焦的Grad-CAM定位。作为一个标注丰富的资源，LUMINA因此提供了（a）一个厂商多样、能量标注的基准数据集，以及（b）一个模型无关的标准化协议，二者共同促进了可靠、可部署的乳腺X线摄影人工智能的发展。