One important direction of Federated Foundation Models (FedFMs) is leveraging data from small client models to enhance the performance of a large server-side foundation model. Existing methods based on model level or representation level knowledge transfer either require expensive local training or incur high communication costs and introduce unavoidable privacy risks. We reformulate this problem as a reinforcement learning style evaluation process and propose FedGRPO, a privacy preserving framework comprising two modules. The first module performs competence-based expert selection by building a lightweight confidence graph from auxiliary data to identify the most suitable clients for each question. The second module leverages the "Group Relative" concept from the Group Relative Policy Optimization (GRPO) framework by packaging each question together with its solution rationale into candidate policies, dispatching these policies to a selected subset of expert clients, and aggregating solely the resulting scalar reward signals via a federated group-relative loss function. By exchanging reward values instead of data or model updates, FedGRPO reduces privacy risk and communication overhead while enabling parallel evaluation across heterogeneous devices. Empirical results on diverse domain tasks demonstrate that FedGRPO achieves superior downstream accuracy and communication efficiency compared to conventional FedFMs baselines.
翻译:联邦基础模型(FedFMs)的一个重要方向是利用来自小型客户端模型的数据来提升大型服务器端基础模型的性能。现有基于模型级或表示级知识迁移的方法,要么需要昂贵的本地训练,要么导致高昂的通信成本并引入不可避免的隐私风险。我们将此问题重新表述为一个强化学习式的评估过程,并提出 FedGRPO——一个包含两个模块的隐私保护框架。第一个模块执行基于能力的专家选择,通过从辅助数据构建轻量级置信度图,为每个问题识别最合适的客户端。第二个模块借鉴了群体相对策略优化(GRPO)框架中的“群体相对”思想,将每个问题与其解答原理打包成候选策略,将这些策略分发给选定的专家客户端子集,并通过一个联邦群体相对损失函数仅聚合最终得到的标量奖励信号。通过交换奖励值而非数据或模型更新,FedGRPO 在降低隐私风险和通信开销的同时,实现了跨异构设备的并行评估。在多个领域任务上的实证结果表明,与传统的 FedFMs 基线方法相比,FedGRPO 在获得更优的下游任务准确率的同时,也实现了更高的通信效率。