Decentralised learning enables the training of deep learning algorithms without centralising data sets, resulting in benefits such as improved data privacy, operational efficiency and the fostering of data ownership policies. However, significant data imbalances pose a challenge in this framework. Participants with smaller datasets in distributed learning environments often achieve poorer results than participants with larger datasets. Data imbalances are particularly pronounced in medical fields and are caused by different patient populations, technological inequalities and divergent data collection practices. In this paper, we consider distributed learning as an Stackelberg evolutionary game. We present two algorithms for setting the weights of each node's contribution to the global model in each training round: the Deterministic Stackelberg Weighting Model (DSWM) and the Adaptive Stackelberg Weighting Model (ASWM). We use three medical datasets to highlight the impact of dynamic weighting on underrepresented nodes in distributed learning. Our results show that the ASWM significantly favours underrepresented nodes by improving their performance by 2.713% in AUC. Meanwhile, nodes with larger datasets experience only a modest average performance decrease of 0.441%.
翻译:去中心化学习使得无需集中数据集即可训练深度学习算法,从而带来诸如提升数据隐私、操作效率和促进数据所有权政策等优势。然而,显著的数据不平衡在此框架中构成了挑战。在分布式学习环境中,拥有较小数据集的参与者通常比拥有较大数据集的参与者获得更差的结果。数据不平衡在医学领域尤为突出,其成因包括不同的患者群体、技术不平等以及各异的数据收集实践。在本文中,我们将分布式学习视为一种斯塔克尔伯格演化博弈。我们提出了两种算法,用于在每一轮训练中设置每个节点对全局模型贡献的权重:确定性斯塔克尔伯格加权模型(DSWM)和自适应斯塔克尔伯格加权模型(ASWM)。我们使用三个医学数据集来突显动态加权对分布式学习中代表性不足节点的影响。我们的结果表明,ASWM显著惠及代表性不足的节点,将其AUC性能提升了2.713%。与此同时,拥有较大数据集的节点仅经历了0.441%的平均性能小幅下降。