Two improved algorithms for sparse generalized canonical correlation analysis

Regularized generalized canonical correlation analysis (RGCCA) is a generalization of regularized canonical correlation analysis to three or more sets of variables, which is a component-based approach aiming to study the relationships between several sets of variables. Sparse generalized canonical correlation analysis (SGCCA) (proposed in Tenenhaus et al. (2014)), combines RGCCA with an `1-penalty, in which blocks are not necessarily fully connected, makes SGCCA a flexible method for analyzing a wide variety of practical problems, such as biology, chemistry, sensory analysis, marketing, food research, etc. In Tenenhaus et al. (2014), an iterative algorithm for SGCCA was designed based on the solution to the subproblem (LM-P1 for short) of maximizing a linear function on the intersection of an `1-norm ball and a unit `2-norm sphere proposed in Witten et al. (2009). However, the solution to the subproblem (LM-P1) proposed in Witten et al. (2009) is not correct, which may become the reason that the iterative algorithm for SGCCA is slow and not always convergent. For this, we first characterize the solution to the subproblem LM-P1, and the subproblems LM-P2 and LM-P3, which maximize a linear function on the intersection of an `1-norm sphere and a unit `2-norm sphere, and an `1-norm ball and a unit `2-norm sphere, respectively. Then we provide more efficient block coordinate descent (BCD) algorithms for SGCCA and its two variants, called SGCCA-BCD1, SGCCA-BCD2 and SGCCA-BCD3, corresponding to the subproblems LM-P1, LM-P2 and LM-P3, respectively, prove that they all globally converge to their stationary points. We further propose gradient projected (GP) methods for SGCCA and its two variants when using the Horst scheme, called SGCCA-GP1, SGCCA-GP2 and SGCCA-GP3, corresponding to the subproblems LM-P1, LM-P2 and LM-P3, respectively, and prove that they all

翻译：正则化广义典型相关分析（RGCCA）是将正则化典型相关分析推广至三个及以上变量集的一种基于成分的方法，旨在研究多组变量之间的关系。稀疏广义典型相关分析（SGCCA）（由Tenenhaus等人（2014）提出）将RGCCA与L1惩罚相结合，且各模块无需完全连通，这使得SGCCA成为分析生物学、化学、感官分析、市场营销、食品研究等广泛实际问题的灵活方法。在Tenenhaus等人（2014）的工作中，基于Witten等人（2009）提出的子问题（简称LM-P1）的解（即在L1范数球与单位L2范数球面交集上最大化线性函数）设计了SGCCA的迭代算法。然而，Witten等人（2009）提出的子问题（LM-P1）的解存在错误，这可能导致SGCCA的迭代算法收敛缓慢且未必收敛。为此，我们首先刻画了子问题LM-P1、LM-P2和LM-P3的解，其中LM-P2和LM-P3分别是在L1范数球面与单位L2范数球面交集上、以及L1范数球与单位L2范数球面交集上最大化线性函数。随后，我们为SGCCA及其两种变体提出了更高效的块坐标下降（BCD）算法，分别对应于子问题LM-P1、LM-P2和LM-P3，记为SGCCA-BCD1、SGCCA-BCD2和SGCCA-BCD3，并证明这些算法均全局收敛到其驻点。我们进一步针对使用Horst方案的SGCCA及其两种变体提出了梯度投影（GP）方法，分别对应于子问题LM-P1、LM-P2和LM-P3，记为SGCCA-GP1、SGCCA-GP2和SGCCA-GP3，并证明这些算法均全局收敛到其驻点。