Fundus fluorescein angiography (FFA) is critical for assessing retinal vascular abnormalities, but its acquisition is invasive and not always feasible. In contrast, color fundus photography (CFP) is non-invasive and widely accessible, which has motivated studies on CFP-to-FFA synthesis. However, prior works rely solely on CFP surface texture, fundamentally limiting the ability to reconstruct functional vascular information and subtle pathological changes. To address this, we propose a novel framework that synthesizes FFA from CFP with structural guidance provided by optical coherence tomography (OCT). We construct a multi-modal retinal imaging dataset with paired CFP, FFA, and OCT from 3,676 patient eyes--the first tri-modally aligned dataset in retinal imaging. To bridge the spatial gap between OCT and fundus modalities, we propose a Spatially Aligned Cross-Modal Fusion (SACMF) module that projects depth-resolved OCT features onto the fundus plane and injects them into the CFP encoder via adaptive layer normalization. Beyond feature fusion, we further introduce Token-wise Cross-Modality Alignment (TCMA), a token-level contrastive learning strategy that explicitly aligns CFP and FFA representations at corresponding spatial positions. Our method achieves superior synthesis performance compared to state-of-the-art methods. Moreover, extensive experiments demonstrate that the FFA images synthesized by our approach bring greater improvements in downstream disease diagnosis performance than existing methods, highlighting the clinical potential of our approach as a non-invasive decision-support tool in routine workflows. The code is available at https://github.com/while-plus/OCT-guide-FFA-Syn.
翻译:眼底荧光素血管造影(FFA)对于评估视网膜血管异常至关重要,但其成像过程具有侵入性且并非始终可行。相比之下,彩色眼底摄影(CFP)具有非侵入性和广泛可及性,这推动了CFP到FFA合成的研究。然而,现有方法仅依赖CFP的表面纹理,从根本上限制了重建功能性血管信息和细微病理变化的能力。为解决这一问题,我们提出了一种新颖框架,通过光学相干断层扫描(OCT)提供的结构引导,从CFP合成FFA。我们构建了一个包含3,676只患者眼睛的配对CFP、FFA和OCT的多模态视网膜成像数据集——这是视网膜成像领域首个三模态对齐数据集。为弥合OCT与眼底模态之间的空间差异,我们提出了空间对齐跨模态融合(SACMF)模块,该模块将深度分辨的OCT特征投影到眼底平面,并通过自适应层归一化将其注入CFP编码器。除了特征融合,我们进一步引入了令牌级跨模态对齐(TCMA),这是一种令牌级对比学习策略,可在对应空间位置上显式对齐CFP和FFA表示。我们的方法相比现有最优方法实现了更优的合成性能。此外,大量实验表明,我们方法合成的FFA图像在提升下游疾病诊断性能方面优于现有方法,凸显了该方法作为常规工作流中非侵入性决策支持工具的临床潜力。代码可从 https://github.com/while-plus/OCT-guide-FFA-Syn 获取。