For a federated learning model to perform well, it is crucial to have a diverse and representative dataset. However, the data contributors may only be concerned with the performance on a specific subset of the population, which may not reflect the diversity of the wider population. This creates a tension between the principal (the FL platform designer) who cares about global performance and the agents (the data collectors) who care about local performance. In this work, we formulate this tension as a game between the principal and multiple agents, and focus on the linear experiment design problem to formally study their interaction. We show that the statistical criterion used to quantify the diversity of the data, as well as the choice of the federated learning algorithm used, has a significant effect on the resulting equilibrium. We leverage this to design simple optimal federated learning mechanisms that encourage data collectors to contribute data representative of the global population, thereby maximizing global performance.
翻译:为使联邦学习模型表现良好,拥有多样化且具代表性的数据集至关重要。然而,数据贡献者可能仅关注特定人群子集上的性能表现,而这未必能反映更广泛人群的多样性。这造成了关注全局性能的委托人(联邦学习平台设计者)与关注局部性能的代理人(数据收集者)之间的利益冲突。本研究将这一冲突建模为委托人与多个代理人之间的博弈,并聚焦于线性实验设计问题以形式化研究其交互机制。我们证明:用于量化数据多样性的统计准则以及联邦学习算法的选择,对最终博弈均衡具有显著影响。基于此,我们设计了简单的最优联邦学习机制,以激励数据收集者贡献具有全局人口代表性的数据,从而最大化全局性能。