Canonical correlation analysis is a classic well-known multivariate statistical method focusing on the relationships between two sets of variables. The visualisation of those relationships can be achieved by means of a biplot of the between-set correlation matrix. The canonical analysis provides a low-rank approximation to the between-set correlation matrix that is optimal in a generalised least squares sense. This article proposes to adjust the between-set correlation matrix using either a single scalar effect, or column and/or row effects. An alternating generalised least squares algorithm is proposed to obtain optimal adjustments and low-rank factorisations. The adjustment leads to a better approximation of the between-set correlation matrix that achieves a lower root mean squared error in comparison with the classic canonical analysis. The results of the adjusted analysis can be efficiently visualised using biplots, with a minimal change in interpretation rules that only affects the biplot origin. Biplot calibration is used to enhance the visualisation of the results of the adjusted analysis. Some examples with publicly available data sets from social science, geochemistry and medical science illustrate the proposed improvement. Software for carrying out the adjusted canonical analysis in the R environment is provided.
翻译:典型相关分析是一种经典的多变量统计方法,专注于研究两组变量之间的关系。通过绘制组间相关矩阵的双标图,可以实现对这些关系的可视化。典型相关分析提供了一种在广义最小二乘意义下最优的组间相关矩阵低秩近似。本文提出使用单一标量效应或列效应和/或行效应对组间相关矩阵进行调整,并设计了一种交替广义最小二乘算法来获取最优调整与低秩分解。这种调整使得组间相关矩阵的逼近效果得到改善,相比经典典型相关分析,其均方根误差更低。调整后的分析结果可通过双标图高效可视化,仅需对解释规则进行微小调整(仅影响双标图的原点)。采用双标图校准技术以增强调整分析结果的可视化效果。通过来自社会科学、地球化学和医学领域的公开数据集示例,展示了所提改进方法的有效性。同时提供了在R环境中实现调整后典型相关分析的软件工具。