Classic and deep generalized canonical correlation analysis (GCCA) algorithms seek low-dimensional common representations of data entities from multiple ``views'' (e.g., audio and image) using linear transformations and neural networks, respectively. When the views are acquired and stored at different computing agents (e.g., organizations and edge devices) and data sharing is undesired due to privacy or communication cost considerations, federated learning-based GCCA is well-motivated. In federated learning, the views are kept locally at the agents and only derived, limited information exchange with a central server is allowed. However, applying existing GCCA algorithms onto such federated learning settings may incur prohibitively high communication overhead. This work puts forth a communication-efficient federated learning framework for both linear and deep GCCA under the maximum variance (MAX-VAR) formulation. The overhead issue is addressed by aggressively compressing (via quantization) the exchanging information between the computing agents and a central controller. Compared to the unquantized version, our empirical study shows that the proposed algorithm enjoys a substantial reduction of communication overheads with virtually no loss in accuracy and convergence speed. Rigorous convergence analyses are also presented, which is a nontrivial effort. Generic federated optimization results do not cover the special problem structure of GCCA. Our result shows that the proposed algorithms for both linear and deep GCCA converge to critical points at a sublinear rate, even under heavy quantization and stochastic approximations. In addition, in the linear MAX-VAR case, the quantized algorithm approaches a global optimum in a geometric rate under reasonable conditions. Synthetic and real-data experiments are used to showcase the effectiveness of the proposed approach.
翻译:经典与深度广义典型相关分析(GCCA)算法分别利用线性变换和神经网络,从多个“视角”(如音频和图像)中寻找数据实体的低维公共表示。当这些视角在不同计算节点(如组织或边缘设备)上获取和存储时,若因隐私或通信成本考虑而不希望共享数据,基于联邦学习的GCCA方法便具有充分动机。在联邦学习中,各视角数据保留在本地节点,仅允许与中央服务器进行有限的、推导出的信息交换。然而,将现有GCCA算法直接应用于此类联邦学习场景会导致极高的通信开销。本文提出了一种通信高效的联邦学习框架,用于解决基于最大方差(MAX-VAR)公式的线性与深度GCCA问题。该框架通过将计算节点与中央控制器间的交换信息进行激进压缩(即量化),有效缓解了通信开销问题。与未量化版本相比,实证研究表明,所提算法能够在几乎不损失精度和收敛速度的前提下,大幅降低通信开销。此外,本文还给出了严格的收敛性分析——这是一项具有挑战性的工作,因为通用联邦优化结果无法涵盖GCCA的特殊问题结构。我们的结果表明,即使在重度量化和随机近似条件下,所提线性与深度GCCA算法均能以次线性速率收敛至临界点。在线性MAX-VAR情形中,量化算法能在合理条件下以几何速率逼近全局最优解。通过合成数据与真实数据实验,验证了所提方法的有效性。