Federated learning has emerged as a promising distributed learning paradigm that facilitates collaborative learning among multiple parties without transferring raw data. However, most existing federated learning studies focus on either horizontal or vertical data settings, where the data of different parties are assumed to be from the same feature or sample space. In practice, a common scenario is the hybrid data setting, where data from different parties may differ both in the features and samples. To address this, we propose HybridTree, a novel federated learning approach that enables federated tree learning on hybrid data. We observe the existence of consistent split rules in trees. With the help of these split rules, we theoretically show that the knowledge of parties can be incorporated into the lower layers of a tree. Based on our theoretical analysis, we propose a layer-level solution that does not need frequent communication traffic to train a tree. Our experiments demonstrate that HybridTree can achieve comparable accuracy to the centralized setting with low computational and communication overhead. HybridTree can achieve up to 8 times speedup compared with the other baselines.
翻译:联邦学习已成为一种有前景的分布式学习范式,它允许多方在不传输原始数据的情况下进行协作学习。然而,现有大多数联邦学习研究要么关注水平数据设置,要么关注垂直数据设置,即假设不同方的数据来自相同的特征或样本空间。在实践中,常见的场景是混合数据设置,即不同方的数据可能在特征和样本上均存在差异。为解决这一问题,我们提出HybridTree,一种新颖的联邦学习方法,能够在混合数据上实现联邦树学习。我们观察到树中存在一致的分裂规则。借助这些分裂规则,我们从理论上证明,各方的知识可以融入树的较低层。基于理论分析,我们提出一种层级别解决方案,无需频繁的通信流量即可训练树。实验表明,HybridTree在计算和通信开销较低的情况下,能达到与集中式设置相当的准确性。与其他基线方法相比,HybridTree可实现高达8倍的加速。