Distributed ensemble learning (DEL) involves training multiple models at distributed learners, and then combining their predictions to improve performance. Existing related studies focus on DEL algorithm design and optimization but ignore the important issue of incentives, without which self-interested learners may be unwilling to participate in DEL. We aim to fill this gap by presenting a first study on the incentive mechanism design for DEL. Our proposed mechanism specifies both the amount of training data and reward for learners with heterogeneous computation and communication costs. One design challenge is to have an accurate understanding regarding how learners' diversity (in terms of training data) affects the ensemble accuracy. To this end, we decompose the ensemble accuracy into a diversity-precision tradeoff to guide the mechanism design. Another challenge is that the mechanism design involves solving a mixed-integer program with a large search space. To this end, we propose an alternating algorithm that iteratively updates each learner's training data size and reward. We prove that under mild conditions, the algorithm converges. Numerical results using MNIST dataset show an interesting result: our proposed mechanism may prefer a lower level of learner diversity to achieve a higher ensemble accuracy.
翻译:分布式集成学习通过在不同分布式学习器上训练多个模型,再组合其预测结果以提升性能。现有相关研究主要关注分布式集成学习的算法设计与优化,却忽视了激励机制这一关键问题——缺乏激励时,追求自身利益最大化的学习器可能不愿参与分布式集成学习。为填补这一空白,我们首次提出了面向分布式集成学习的激励机制设计。所提出的机制针对具有异构计算与通信成本的学习器,同时规定了训练数据量及其奖励。设计挑战之一在于准确理解学习器多样性(就训练数据而言)对集成精度的影响。为此,我们将集成精度分解为多样性-精度权衡关系以指导机制设计。另一挑战在于机制设计需求解搜索空间巨大的混合整数规划问题。因此,我们提出一种交替算法,通过迭代更新每个学习器的训练数据量与奖励。我们证明在温和条件下该算法具有收敛性。基于MNIST数据集的数值结果显示一个有趣的现象:我们提出的机制可能偏好较低的多样性水平以实现更高的集成精度。