Large language models (LLMs) have transformed the way computers understand and process human language, but using them effectively across different organizations remains still difficult. When organizations work together to improve LLMs, they face several main challenges. First, organizations hesitate to share their valuable data with others. Second, competition between organizations creates trust problems during collaboration. Third, new privacy laws require organizations to be able to delete specific data when requested, which is especially difficult when multiple organizations are learning from shared data. Traditional federated learning approaches do not address these interconnected challenges, particularly in scenarios where participants cannot fully trust each other or the central aggregator. To overcome these limitations, we propose a hybrid blockchain-based federated learning framework that uniquely combines public and private blockchain architectures with multi-agent reinforcement learning. Our framework enables transparent sharing of model update through the public blockchain while protecting sensitive computations in private chains. Each organization operates as an intelligent agent, using Q-learning to optimize its participation strategy and resource allocation, thus aligning individual incentives with collective goals. Notably, we introduce an efficient unlearning mechanism based on Low-Rank Adaptation (LoRA) that enables selective removal of specific data contributions without compromising the model's overall performance. Through extensive experimentation on real-world datasets, we demonstrate that our framework effectively balances privacy protection, trust establishment, and regulatory compliance while maintaining high model performance.
翻译:大语言模型(LLMs)已彻底改变了计算机理解和处理人类语言的方式,但在不同组织间有效运用这些模型仍面临诸多困难。当多个组织协作改进LLMs时,主要面临以下挑战:首先,组织不愿共享其宝贵数据;其次,组织间的竞争关系导致协作过程中产生信任问题;第三,新兴隐私法规要求组织在必要时能够删除特定数据,这在多组织共享数据学习场景中尤为困难。传统联邦学习方法未能解决这些相互关联的挑战,特别是在参与者无法完全信任彼此或中央聚合器的情况下。为突破这些限制,我们提出一种基于混合区块链的联邦学习框架,创新性地将公有链与私有链架构与多智能体强化学习相结合。该框架通过公有链实现模型更新的透明共享,同时在私有链中保护敏感计算过程。每个组织作为智能体运行,利用Q学习优化其参与策略与资源分配,从而使个体激励与集体目标保持一致。值得注意的是,我们引入了一种基于低秩自适应(LoRA)的高效遗忘机制,能够在不影响模型整体性能的前提下,选择性移除特定数据贡献。通过对真实数据集的大量实验,我们证明该框架在保持高模型性能的同时,有效平衡了隐私保护、信任建立与法规合规性。