Independent component analysis (ICA) is a blind source separation method for linear disentanglement of independent latent sources from observed data. We investigate the special setting of noisy linear ICA where the observations are split among different views, each receiving a mixture of shared and individual sources. We prove that the corresponding linear structure is identifiable, and the source distribution can be recovered. To computationally estimate the sources, we optimize a constrained form of the joint log-likelihood of the observed data among all views. We also show empirically that our objective recovers the sources also in the case when the measurements are corrupted by noise. Furthermore, we propose a model selection procedure for recovering the number of shared sources which we verify empirically. Finally, we apply the proposed model in a challenging real-life application, where the estimated shared sources from two large transcriptome datasets (observed data) provided by two different labs (two different views) lead to recovering (shared) sources utilized for finding a plausible representation of the underlying graph structure.
翻译:独立成分分析(ICA)是一种从观测数据中线性解耦独立潜在源的盲源分离方法。本文研究了带噪声线性ICA的特殊设定,其中观测数据被分割至不同视角,每个视角接收由共享源与个体源混合组成的信号。我们证明了相应的线性结构具有可辨识性,且源分布可被复原。为计算估计源,我们优化了所有视角中观测数据联合对数似然函数的约束形式。实验表明,即便在测量值受噪声污染的情况下,我们的目标函数仍能恢复源信号。此外,我们提出了一种用于恢复共享源数量的模型选择方法,并通过实验验证了其有效性。最后,将所提模型应用于具有挑战性的真实场景:从两个不同实验室(两个不同视角)提供的两个大型转录组数据集(观测数据)中估计共享源,这些共享源被用于恢复底层图结构的合理表示。