Multimodal clinical data are characterized by high dimensionality, heterogeneous representations, and structured missingness, posing significant challenges for predictive modeling, data integration, and interpretability. We propose BIONIC (Bayesian Integration of Nonlinear Incomplete Clinical data), a unified probabilistic framework that integrates heterogeneous multimodal data under missingness through a joint generative-discriminative latent architecture. BIONIC uses pretrained embeddings for complex modalities such as medical images and clinical text, while incorporating structured clinical variables directly within a Bayesian multimodal formulation. The proposed framework enables robust learning in partially observed and semi-supervised settings by explicitly modeling modality-level and variable-level missingness, as well as missing labels. We evaluate BIONIC on three multimodal clinical and biomedical datasets, demonstrating strong and consistent discriminative performance compared to representative multimodal baselines, particularly under incomplete data scenarios. Beyond predictive accuracy, BIONIC provides intrinsic interpretability through its latent structure, enabling population-level analysis of modality relevance and supporting clinically meaningful insight.
翻译:多模态临床数据具有高维性、异质性表征和结构化缺失的特点,这对预测建模、数据集成和可解释性提出了重大挑战。我们提出了BIONIC(非线性不完整临床数据的贝叶斯集成),这是一个统一的概率框架,通过联合生成-判别式潜在架构,在数据缺失条件下集成异质多模态数据。BIONIC对医学影像和临床文本等复杂模态使用预训练嵌入,同时将结构化临床变量直接纳入贝叶斯多模态公式中。该框架通过对模态级和变量级缺失以及标签缺失进行显式建模,能够在部分观测和半监督设置下实现稳健学习。我们在三个多模态临床和生物医学数据集上评估了BIONIC,与代表性多模态基线方法相比,特别是在不完整数据场景下,BIONIC展现出强大且一致的判别性能。除了预测准确性,BIONIC还通过其潜在结构提供内在可解释性,支持模态相关性的群体水平分析,并有助于获得具有临床意义的洞见。