Machine learning (ML) models trained on datasets owned by different organizations and physically located in remote databases offer benefits in many real-world use cases. State regulations or business requirements often prevent data transfer to a central location, making it difficult to utilize standard machine learning algorithms. Federated Learning (FL) is a technique that enables models to learn from distributed datasets without revealing the original data. Vertical Federated learning (VFL) is a type of FL where data samples are divided by features across several data owners. For instance, in a recommendation task, a user can interact with various sets of items, and the logs of these interactions are stored by different organizations. In this demo paper, we present \emph{Stalactite} - an open-source framework for VFL that provides the necessary functionality for building prototypes of VFL systems. It has several advantages over the existing frameworks. In particular, it allows researchers to focus on the algorithmic side rather than engineering and to easily deploy learning in a distributed environment. It implements several VFL algorithms and has a built-in homomorphic encryption layer. We demonstrate its use on a real-world recommendation datasets.
翻译:在由不同组织拥有且物理上存储于远程数据库的数据集上训练的机器学习模型,为许多现实应用场景带来了益处。然而,国家法规或业务要求常常禁止将数据传输至中心位置,这使得利用标准的机器学习算法变得困难。联邦学习是一种能够使模型从分布式数据集中学习而无需暴露原始数据的技术。纵向联邦学习是联邦学习的一种类型,其中数据样本按特征划分给多个数据所有者。例如,在推荐任务中,一个用户可能与多组不同的物品进行交互,而这些交互的日志由不同的组织存储。在本演示论文中,我们介绍了 \emph{Stalactite}——一个用于纵向联邦学习的开源框架,它为构建纵向联邦学习系统的原型提供了必要的功能。与现有框架相比,它具有若干优势。具体而言,它使研究人员能够专注于算法层面而非工程实现,并能轻松地在分布式环境中部署学习过程。该框架实现了多种纵向联邦学习算法,并内置了同态加密层。我们在真实世界的推荐数据集上展示了其使用方法。