With the rapid advancement of the digital economy, data collaboration between organizations has become a well-established business model, driving the growth of various industries. However, privacy concerns make direct data sharing impractical. To address this, Two-Party Split Learning (a.k.a. Vertical Federated Learning (VFL)) has emerged as a promising solution for secure collaborative learning. Despite its advantages, this architecture still suffers from low computational resource utilization and training efficiency. Specifically, its synchronous dependency design increases training latency, while resource and data heterogeneity among participants further hinder efficient computation. To overcome these challenges, we propose PubSub-VFL, a novel VFL paradigm with a Publisher/Subscriber architecture optimized for two-party collaborative learning with high computational efficiency. PubSub-VFL leverages the decoupling capabilities of the Pub/Sub architecture and the data parallelism of the parameter server architecture to design a hierarchical asynchronous mechanism, reducing training latency and improving system efficiency. Additionally, to mitigate the training imbalance caused by resource and data heterogeneity, we formalize an optimization problem based on participants' system profiles, enabling the selection of optimal hyperparameters while preserving privacy. We conduct a theoretical analysis to demonstrate that PubSub-VFL achieves stable convergence and is compatible with security protocols such as differential privacy. Extensive case studies on five benchmark datasets further validate its effectiveness, showing that, compared to state-of-the-art baselines, PubSub-VFL not only accelerates training by $2 \sim 7\times$ without compromising accuracy, but also achieves a computational resource utilization rate of up to 91.07%.
翻译:随着数字经济的快速发展,机构间的数据协作已成为成熟的商业模式,推动着各行业的增长。然而,隐私问题使得直接数据共享难以实现。为此,两方拆分学习(亦称纵向联邦学习)作为一种安全协作学习的解决方案应运而生。尽管具有优势,该架构仍存在计算资源利用率低和训练效率不足的问题。具体而言,其同步依赖设计增加了训练延迟,而参与者间的资源与数据异构性进一步阻碍了高效计算。为应对这些挑战,我们提出PubSub-VFL——一种基于发布/订阅架构的新型纵向联邦学习范式,专为具有高计算效率的两方协作学习优化设计。PubSub-VFL利用发布/订阅架构的解耦能力与参数服务器架构的数据并行性,设计了一种分层异步机制,从而降低训练延迟并提升系统效率。此外,为缓解资源与数据异构性导致的训练不平衡问题,我们基于参与者的系统配置形式化了一个优化问题,使其能在保护隐私的同时选择最优超参数。我们通过理论分析证明,PubSub-VFL能够实现稳定收敛,且兼容差分隐私等安全协议。在五个基准数据集上的大量案例研究进一步验证了其有效性:相较于前沿基线方法,PubSub-VFL不仅在不损失精度的前提下将训练速度提升$2 \sim 7$倍,同时实现了高达91.07%的计算资源利用率。