Federated learning (FL) is the most popular distributed machine learning technique. FL allows machine-learning models to be trained without acquiring raw data to a single point for processing. Instead, local models are trained with local data; the models are then shared and combined. This approach preserves data privacy as locally trained models are shared instead of the raw data themselves. Broadly, FL can be divided into horizontal federated learning (HFL) and vertical federated learning (VFL). For the former, different parties hold different samples over the same set of features; for the latter, different parties hold different feature data belonging to the same set of samples. In a number of practical scenarios, VFL is more relevant than HFL as different companies (e.g., bank and retailer) hold different features (e.g., credit history and shopping history) for the same set of customers. Although VFL is an emerging area of research, it is not well-established compared to HFL. Besides, VFL-related studies are dispersed, and their connections are not intuitive. Thus, this survey aims to bring these VFL-related studies to one place. Firstly, we classify existing VFL structures and algorithms. Secondly, we present the threats from security and privacy perspectives to VFL. Thirdly, for the benefit of future researchers, we discussed the challenges and prospects of VFL in detail.
翻译:联邦学习(FL)是最流行的分布式机器学习技术。FL允许在不将原始数据集中到单一处理点的情况下训练机器学习模型。具体而言,本地模型使用本地数据进行训练,随后这些模型被共享和组合。由于共享的是本地训练的模型而非原始数据,该方法能保护数据隐私。广义上,FL可分为横向联邦学习(HFL)和纵向联邦学习(VFL)。前者中,不同参与方持有相同特征集上的不同样本;后者中,不同参与方持有属于相同样本集的不同特征数据。在许多实际场景中,VFL比HFL更具相关性,因为不同公司(如银行和零售商)对同一客户群体持有不同特征(如信用记录和购物记录)。尽管VFL是一个新兴研究领域,但与HFL相比其体系尚不完善。此外,VFL相关研究较为分散,彼此关联不够直观。因此,本综述旨在整合VFL相关研究。首先,我们对现有VFL结构和算法进行分类;其次,从安全与隐私角度阐述VFL面临的威胁;最后,为便于未来研究者参考,详细讨论了VFL的挑战与前景。