Recent developments in regularized Canonical Correlation Analysis (CCA) promise powerful methods for high-dimensional, multiview data analysis. However, justifying the structural assumptions behind many popular approaches remains a challenge, and features of realistic biological datasets pose practical difficulties that are seldom discussed. We propose a novel CCA estimator rooted in an assumption of conditional independencies and based on the Graphical Lasso. Our method has desirable theoretical guarantees and good empirical performance, demonstrated through extensive simulations and real-world biological datasets. Recognizing the difficulties of model selection in high dimensions and other practical challenges of applying CCA in real-world settings, we introduce a novel framework for evaluating and interpreting regularized CCA models in the context of Exploratory Data Analysis (EDA), which we hope will empower researchers and pave the way for wider adoption.
翻译:近年来,正则化典型相关分析(CCA)的发展为高维多视图数据分析提供了强大方法。然而,为许多主流方法背后的结构假设提供理论依据仍是一项挑战,而现实生物数据集的特征也带来了鲜少讨论的实际困难。我们提出了一种基于条件独立性假设并依托图形套索的新型CCA估计量。通过大量模拟实验和真实生物数据集的验证,我们的方法展现出理想的理论保证和良好的实证性能。鉴于高维环境下模型选择的困难以及其他实际应用CCA时的挑战,我们引入了一个在探索性数据分析(EDA)框架下评估与解释正则化CCA模型的新范式,期望这一框架能赋能研究者并推动该方法的广泛采用。