Multimodal high-dimensional data are increasingly prevalent in biomedical research, yet they are often compromised by block-wise missingness and measurement errors, posing significant challenges for statistical inference and prediction. We propose AdapDISCOM, a novel adaptive direct sparse regression method that simultaneously addresses these two pervasive issues. Building on the DISCOM framework, AdapDISCOM introduces modality-specific weighting schemes to account for heterogeneity in data structures and error magnitudes across modalities. We establish the theoretical properties of AdapDISCOM, including model selection consistency and convergence rates under sub-Gaussian and heavy-tailed settings, and develop robust and computationally efficient variants (AdapDISCOM-Huber and Fast-AdapDISCOM). Extensive simulations demonstrate that AdapDISCOM consistently outperforms existing methods such as DISCOM, SCOM, and CoCoLasso, particularly under heterogeneous contamination and heavy-tailed distributions. Finally, we apply AdapDISCOM to Alzheimers Disease Neuroimaging Initiative (ADNI) data, demonstrating improved prediction of cognitive scores and reliable selection of established biomarkers, even with substantial missingness and measurement errors. AdapDISCOM provides a flexible, robust, and scalable framework for high-dimensional multimodal data analysis under realistic data imperfections.
翻译:多模态高维数据在生物医学研究中日益普遍,但这些数据常受到块缺失和测量误差的影响,给统计推断和预测带来了重大挑战。我们提出了AdapDISCOM,一种新颖的自适应直接稀疏回归方法,可同时处理这两个普遍存在的问题。基于DISCOM框架,AdapDISCOM引入了模态特定的加权方案,以考虑不同模态之间数据结构和误差大小的异质性。我们建立了AdapDISCOM的理论性质,包括在亚高斯和重尾设定下的模型选择一致性及收敛速率,并开发了稳健且计算高效的变体(AdapDISCOM-Huber和Fast-AdapDISCOM)。大量的模拟实验表明,AdapDISCOM在异质性污染和重尾分布下,其性能始终优于DISCOM、SCOM和CoCoLasso等现有方法。最后,我们将AdapDISCOM应用于阿尔茨海默病神经影像学倡议(ADNI)数据,证明即使在存在大量缺失和测量误差的情况下,该方法也能改善认知评分的预测,并可靠地筛选出已确立的生物标志物。AdapDISCOM为现实数据缺陷下的高维多模态数据分析提供了一个灵活、稳健且可扩展的框架。