It is increasingly common to collect data of multiple different types on the same set of samples. Our focus is on studying relationships between such multiview features and responses. A motivating application arises in the context of precision medicine where multi-omics data are collected to correlate with clinical outcomes. It is of interest to infer dependence within and across views while combining multimodal information to improve the prediction of outcomes. The signal-to-noise ratio can vary substantially across views, motivating more nuanced statistical tools beyond standard late and early fusion. This challenge comes with the need to preserve interpretability, select features, and obtain accurate uncertainty quantification. To address these challenges, we introduce two complementary factor regression models. A baseline Joint Factor Regression (\textsc{jfr}) captures combined variation across views via a single factor set, and a more nuanced Joint Additive FActor Regression (\textsc{jafar}) that decomposes variation into shared and view-specific components. For \textsc{jfr}, we use independent cumulative shrinkage process (\textsc{i-cusp}) priors, while for \textsc{jafar} we develop a dependent version (\textsc{d-cusp}) designed to ensure identifiability of the components. We develop Gibbs samplers that exploit the model structure and accommodate flexible feature and outcome distributions. Prediction of time-to-labor onset from immunome, metabolome, and proteome data illustrates performance gains against state-of-the-art competitors. Our open-source software (\texttt{R} package) is available at https://github.com/niccoloanceschi/jafar.
翻译:在同一组样本上收集多种不同类型数据的情况日益普遍。我们的研究重点在于探索此类多视图特征与响应变量之间的关系。精准医疗领域的应用为研究提供了重要动机:通过收集多组学数据来关联临床结果。研究目标是在整合多模态信息以改善结果预测的同时,推断视图内部及跨视图的依赖关系。不同视图间的信噪比可能存在显著差异,这促使我们需要超越标准晚期融合与早期融合方法,发展更精细的统计工具。这一挑战伴随着保持模型可解释性、选择重要特征以及获得准确不确定性量化的需求。为应对这些挑战,我们提出了两种互补的因子回归模型。基准模型联合因子回归(JFR)通过单一因子集合捕捉跨视图的联合变异,而更精细的联合加性因子回归(JAFAR)则将变异分解为共享成分与视图特定成分。对于JFR模型,我们采用独立累积收缩过程(I-CUSP)先验;对于JAFAR模型,我们开发了旨在确保成分可识别性的依赖版本(D-CUSP)先验。我们开发了能够利用模型结构并适应灵活特征与结果分布的吉布斯采样器。通过免疫组、代谢组和蛋白质组数据预测分娩发动时间的案例表明,本方法相较于前沿竞争模型具有性能优势。我们的开源软件(R语言包)可在 https://github.com/niccoloanceschi/jafar 获取。