Generalized correlation analysis (GCA) is concerned with uncovering linear relationships across multiple datasets. It generalizes canonical correlation analysis that is designed for two datasets. We study sparse GCA when there are potentially multiple generalized correlation tuples in data and the loading matrix has a small number of nonzero rows. It includes sparse CCA and sparse PCA of correlation matrices as special cases. We first formulate sparse GCA as generalized eigenvalue problems at both population and sample levels via a careful choice of normalization constraints. Based on a Lagrangian form of the sample optimization problem, we propose a thresholded gradient descent algorithm for estimating GCA loading vectors and matrices in high dimensions. We derive tight estimation error bounds for estimators generated by the algorithm with proper initialization. We also demonstrate the prowess of the algorithm on a number of synthetic datasets.
翻译:广义相关分析(GCA)旨在揭示多个数据集之间的线性关系,它是针对两个数据集设计的典型相关分析的推广。我们研究数据中可能存在多个广义相关对,且载荷矩阵仅具有少量非零行时的稀疏GCA问题。该框架将相关矩阵的稀疏典型相关分析(CCA)和稀疏主成分分析(PCA)视为特例。通过精心选择归一化约束条件,我们首先在总体和样本层面将稀疏GCA表述为广义特征值问题。基于样本优化问题的拉格朗日形式,我们提出了一种用于高维场景下估计GCA载荷向量和矩阵的阈值梯度下降算法。在适当初始化的条件下,我们推导了该算法生成估计量的紧致误差界。此外,我们在多个合成数据集上验证了该算法的卓越性能。