Many real-world systems are described not only by data from a single source but via multiple data views. In genomic medicine, for instance, patients can be characterized by data from different molecular layers. Latent variable models with structured sparsity are a commonly used tool for disentangling variation within and across data views. However, their interpretability is cumbersome since it requires a direct inspection and interpretation of each factor from domain experts. Here, we propose MuVI, a novel multi-view latent variable model based on a modified horseshoe prior for modeling structured sparsity. This facilitates the incorporation of limited and noisy domain knowledge, thereby allowing for an analysis of multi-view data in an inherently explainable manner. We demonstrate that our model (i) outperforms state-of-the-art approaches for modeling structured sparsity in terms of the reconstruction error and the precision/recall, (ii) robustly integrates noisy domain expertise in the form of feature sets, (iii) promotes the identifiability of factors and (iv) infers interpretable and biologically meaningful axes of variation in a real-world multi-view dataset of cancer patients.
翻译:许多现实世界系统不仅由单一数据源描述,而是通过多个数据视图进行刻画。例如,在基因组医学中,患者可通过不同分子层面的数据进行表征。具有结构化稀疏性的潜变量模型是分离数据视图内部及跨视图变异特性的常用工具。然而,其可解释性存在困难,因为这需要领域专家逐一检查并解释每个因子。本文提出MuVI——一种基于改进马蹄形先验的新型多视图潜变量模型,用于对结构化稀疏性进行建模。该模型能够整合有限且含噪的领域知识,从而以内在可解释的方式分析多视图数据。我们证明该模型:(i)在重构误差和精确率/召回率指标上优于现有结构化稀疏性建模方法;(ii)以特征集形式稳健整合含噪领域专业知识;(iii)提升因子的可辨识性;(iv)在真实癌症患者多视图数据集中推断出具有生物学意义且可解释的变异轴。