Canonical correlation analysis (CCA) is a classic statistical method for discovering latent co-variation that underpins two or more observed random vectors. Several extensions and variations of CCA have been proposed that have strengthened our capabilities in terms of revealing common random factors from multiview datasets. In this work, we first revisit the most recent deterministic extensions of deep CCA and highlight the strengths and limitations of these state-of-the-art methods. Some methods allow trivial solutions, while others can miss weak common factors. Others overload the problem by also seeking to reveal what is not common among the views -- i.e., the private components that are needed to fully reconstruct each view. The latter tends to overload the problem and its computational and sample complexities. Aiming to improve upon these limitations, we design a novel and efficient formulation that alleviates some of the current restrictions. The main idea is to model the private components as conditionally independent given the common ones, which enables the proposed compact formulation. In addition, we also provide a sufficient condition for identifying the common random factors. Judicious experiments with synthetic and real datasets showcase the validity of our claims and the effectiveness of the proposed approach.
翻译:典型相关分析(CCA)是一种经典的统计方法,用于发现支撑两个或多个观测随机向量的潜在协变关系。已有多种CCA的扩展和变体被提出,增强了我们从多视图数据集中揭示共同随机因子的能力。本文首先重新审视了最前沿的确定性深度CCA扩展,并强调了这些最新方法的优势与局限性。某些方法会产生平凡解,而另一些方法则可能遗漏弱共同因子。还有一些方法通过同时寻求揭示视图间的非共性(即完整重构每个视图所需的私有成分)而导致问题过载,进而增加了计算复杂度和样本复杂度。为改进这些局限,我们设计了一种新颖高效的公式,缓解了当前的部分限制。核心思想是将私有成分建模为在给定共同成分条件下的条件独立,从而实现了所提出的紧凑公式。此外,我们还给出了识别共同随机因子的充分条件。在合成数据集和真实数据集上的严谨实验验证了我们的论断的有效性以及所提方法的优越性。