Breast cancer is a significant health concern affecting millions of women worldwide. Accurate survival risk stratification plays a crucial role in guiding personalised treatment decisions and improving patient outcomes. Here we present BioFusionNet, a deep learning framework that fuses image-derived features with genetic and clinical data to obtain a holistic profile and achieve survival risk stratification of ER+ breast cancer patients. We employ multiple self-supervised feature extractors (DINO and MoCoV3) pretrained on histopathological patches to capture detailed image features. These features are then fused by a variational autoencoder and fed to a self-attention network generating patient-level features. A co-dual-cross-attention mechanism combines the histopathological features with genetic data, enabling the model to capture the interplay between them. Additionally, clinical data is incorporated using a feed-forward network, further enhancing predictive performance and achieving comprehensive multimodal feature integration. Furthermore, we introduce a weighted Cox loss function, specifically designed to handle imbalanced survival data, which is a common challenge. Our model achieves a mean concordance index of 0.77 and a time-dependent area under the curve of 0.84, outperforming state-of-the-art methods. It predicts risk (high versus low) with prognostic significance for overall survival in univariate analysis (HR=2.99, 95% CI: 1.88--4.78, p<0.005), and maintains independent significance in multivariate analysis incorporating standard clinicopathological variables (HR=2.91, 95\% CI: 1.80--4.68, p<0.005).
翻译:乳腺癌是影响全球数百万女性的重大健康问题。准确的生存风险分层对于指导个体化治疗决策和改善患者预后至关重要。本文提出BioFusionNet,一种深度学习框架,通过融合图像衍生特征与遗传及临床数据,获得整体性特征谱并实现ER+乳腺癌患者的生存风险分层。我们采用在组织病理学图像块上预训练的多种自监督特征提取器(DINO与MoCoV3)以捕获细节图像特征。这些特征通过变分自编码器进行融合,并输入至自注意力网络以生成患者级特征。一种协同双交叉注意力机制将组织病理学特征与遗传数据相结合,使模型能够捕捉二者间的相互作用。此外,通过前馈神经网络整合临床数据,进一步提升预测性能并实现全面的多模态特征融合。针对生存数据不平衡这一常见挑战,我们专门设计了加权Cox损失函数。该模型取得了0.77的平均一致性指数和0.84的时间依赖性曲线下面积,性能优于现有先进方法。其预测的风险分级(高 vs 低)在单变量分析中对总生存期具有显著预后意义(HR=2.99, 95% CI: 1.88--4.78, p<0.005),且在纳入标准临床病理变量的多变量分析中仍保持独立显著性(HR=2.91, 95% CI: 1.80--4.68, p<0.005)。