The widespread adoption of large-scale machine learning models in recent years highlights the need for distributed computing for efficiency and scalability. This work introduces a novel distributed machine learning paradigm -- \emph{consensus learning} -- which combines classical ensemble methods with consensus protocols deployed in peer-to-peer systems. These algorithms consist of two phases: first, participants develop their models and submit predictions for any new data inputs; second, the individual predictions are used as inputs for a communication phase, which is governed by a consensus protocol. Consensus learning ensures user data privacy, while also inheriting the safety measures against Byzantine attacks from the underlying consensus mechanism. We provide a detailed theoretical analysis for a particular consensus protocol and compare the performance of the consensus learning ensemble with centralised ensemble learning algorithms. The discussion is supplemented by various numerical simulations, which describe the robustness of the algorithms against Byzantine participants.
翻译:近年来,大规模机器学习模型的广泛采用凸显了分布式计算在效率和可扩展性方面的需求。本文引入了一种新颖的分布式机器学习范式——*共识学习*——它将经典的集成方法与部署在点对点系统中的共识协议相结合。这些算法包含两个阶段:首先,参与者开发自己的模型,并针对任何新数据输入提交预测;其次,各个预测被用作通信阶段的输入,该阶段由共识协议控制。共识学习确保了用户数据的隐私性,同时继承了底层共识机制针对拜占庭攻击的安全防护措施。我们针对特定共识协议提供了详细的理论分析,并将共识学习集成与集中式集成学习算法的性能进行了比较。讨论辅以各种数值模拟,这些模拟描述了算法对拜占庭参与者的鲁棒性。