Learning to Collaborate: An Orchestrated-Decentralized Framework for Peer-to-Peer LLM Federation

Fine-tuning Large Language Models (LLMs) for specialized domains is constrained by a fundamental challenge: the need for diverse, cross-organizational data conflicts with the principles of data privacy and sovereignty. While Federated Learning (FL) provides a framework for collaboration without raw data exchange, its classic centralized form introduces a single point of failure and remains vulnerable to model inversion attacks. Decentralized FL (DFL) mitigates this risk by removing the central aggregator but typically relies on inefficient, random peer-to-peer (P2P) pairings, forming a collaboration graph that is blind to agent heterogeneity and risks negative transfer. This paper introduces KNEXA-FL, a novel framework for orchestrated decentralization that resolves this trade-off. KNEXA-FL employs a non-aggregating Central Profiler/Matchmaker (CPM) that formulates P2P collaboration as a contextual bandit problem, using a LinUCB algorithm on abstract agent profiles to learn an optimal matchmaking policy. It orchestrates direct knowledge exchange between heterogeneous, PEFT-based LLM agents via secure distillation, without ever accessing the models themselves. Our comprehensive experiments on a challenging code generation task show that KNEXA-FL yields substantial gains, improving Pass@1 by approx. 50% relative to random P2P collaboration. Critically, our orchestrated approach demonstrates stable convergence, in stark contrast to a powerful centralized distillation baseline which suffers from catastrophic performance collapse. Our work establishes adaptive, learning-based orchestration as a foundational principle for building robust and effective decentralized AI ecosystems.

翻译：为特定领域微调大语言模型面临一个根本性挑战：对多样化、跨组织数据的需求与数据隐私及主权原则相冲突。联邦学习为无需原始数据交换的协作提供了框架，但其经典的集中式架构引入了单点故障风险，且仍易受模型逆向攻击。去中心化联邦学习通过移除中央聚合器来降低此风险，但通常依赖于低效的随机点对点配对，形成的协作图无法感知智能体异质性，并存在负迁移风险。本文提出KNEXA-FL这一新颖的编排式去中心化框架以解决此权衡问题。KNEXA-FL采用非聚合的中央分析器/匹配器，将点对点协作建模为上下文赌博机问题，通过在抽象智能体档案上应用LinUCB算法来学习最优匹配策略。该框架通过安全蒸馏技术，编排基于参数高效微调的异质大语言模型智能体之间进行直接知识交换，且从不访问模型本身。我们在具有挑战性的代码生成任务上进行的全面实验表明，相较于随机点对点协作，KNEXA-FL带来显著性能提升，将Pass@1指标提高约50%。关键的是，我们的编排方法展现出稳定的收敛性，与遭遇灾难性性能崩溃的强大集中式蒸馏基线形成鲜明对比。本研究确立了基于学习的自适应编排作为构建稳健高效去中心化人工智能生态系统的基础原则。