Recent research has increasingly focused on training large language models (LLMs) using federated learning, known as FedLLM. However, responsible AI (RAI), which aims to ensure safe and trustworthy responses, remains underexplored in this context. In FedLLM, client-side training data may contain harmful content, resulting in unsafe LLMs that can generate inappropriate responses. Aggregating such models into a global model and redistributing it to clients risks the widespread deployment of unsafe LLMs. To address this, we incorporate two well-established RAI techniques into FedLLM: safety filtering and constitutional AI. Our experiments show that these methods significantly improve LLM safety, achieving over 20% improvement on AdvBench.
翻译:近期研究日益关注利用联邦学习训练大语言模型(LLMs)的范式(即FedLLM)。然而,旨在确保安全可信响应的负责任人工智能(RAI)在此场景中仍待深入探索。在FedLLM中,客户端训练数据可能含有有害内容,导致LLMs生成不当响应;若将这些模型聚合为全局模型并重新分发至客户端,将面临不安全LLMs大规模部署的风险。针对此问题,我们将两种成熟的RAI技术引入FedLLM:安全过滤与宪法人工智能。实验表明,这些方法显著提升LLMs安全性,在AdvBench上实现超过20%的性能改善。