Rashomon Sets and Model Multiplicity in Federated Learning

The Rashomon set captures the collection of models that achieve near-identical empirical performance yet may differ substantially in their decision boundaries. Understanding the differences among these models, i.e., their multiplicity, is recognized as a crucial step toward model transparency, fairness, and robustness, as it reveals decision boundaries instabilities that standard metrics obscure. However, the existing definitions of Rashomon set and multiplicity metrics assume centralized learning and do not extend naturally to decentralized, multi-party settings like Federated Learning (FL). In FL, multiple clients collaboratively train models under a central server's coordination without sharing raw data, which preserves privacy but introduces challenges from heterogeneous client data distribution and communication constraints. In this setting, the choice of a single best model may homogenize predictive behavior across diverse clients, amplify biases, or undermine fairness guarantees. In this work, we provide the first formalization of Rashomon sets in FL.First, we adapt the Rashomon set definition to FL, distinguishing among three perspectives: (I) a global Rashomon set defined over aggregated statistics across all clients, (II) a t-agreement Rashomon set representing the intersection of local Rashomon sets across a fraction t of clients, and (III) individual Rashomon sets specific to each client's local distribution.Second, we show how standard multiplicity metrics can be estimated under FL's privacy constraints. Finally, we introduce a multiplicity-aware FL pipeline and conduct an empirical study on standard FL benchmark datasets. Our results demonstrate that all three proposed federated Rashomon set definitions offer valuable insights, enabling clients to deploy models that better align with their local data, fairness considerations, and practical requirements.

翻译：Rashomon集合捕捉了那些在实证性能上几乎相同但决策边界可能显著不同的模型集合。理解这些模型之间的差异（即多样性）被认为是实现模型透明性、公平性和鲁棒性的关键步骤，因为它揭示了标准指标难以发现的决策边界不稳定性。然而，现有的Rashomon集合定义和多样性度量假设集中式学习，无法自然扩展到联邦学习（FL）等去中心化多方场景。在联邦学习中，多个客户端在不共享原始数据的情况下，通过中心服务器的协调协同训练模型，这种方式保护了隐私，但引入了异构客户端数据分布和通信约束带来的挑战。在此场景下，选择单一最佳模型可能会使不同客户端间的预测行为同质化，放大偏差或破坏公平性保障。本文首次对联邦学习中的Rashomon集合进行了形式化定义。首先，我们将Rashomon集合定义适配到联邦学习，区分了三种视角：（I）基于所有客户端聚合统计量定义的全局Rashomon集合，（II）表示在t比例客户端上局部Rashomon集合交集的t-一致Rashomon集合，以及（III）每个客户端局部分布专属的个体Rashomon集合。其次，我们展示了如何在联邦学习的隐私约束下估计标准多样性度量。最后，我们引入了一种具有多样性感知的联邦学习流水线，并在标准联邦学习基准数据集上开展了实证研究。结果表明，所有三种联邦Rashomon集合定义都能提供有价值的见解，使客户端能够部署与其局部数据、公平性考量及实际需求更契合的模型。