提升医疗AI安全性与可信度：基于多智能体评估循环的方法 (Improving the Safety and Trustworthiness of Medical AI via Multi-Agent Evaluation Loops)

Zainab Ghafoor,Md Shafiqul Islam,Koushik Howlader,Md Rasel Khondokar,Tanusree Bhattacharjee,Sayantan Chakraborty,Adrito Roy,Ushashi Bhattacharjee,Tirtho Roy

Large Language Models (LLMs) are increasingly applied in healthcare, yet ensuring their ethical integrity and safety compliance remains a major barrier to clinical deployment. This work introduces a multi-agent refinement framework designed to enhance the safety and reliability of medical LLMs through structured, iterative alignment. Our system combines two generative models - DeepSeek R1 and Med-PaLM - with two evaluation agents, LLaMA 3.1 and Phi-4, which assess responses using the American Medical Association's (AMA) Principles of Medical Ethics and a five-tier Safety Risk Assessment (SRA-5) protocol. We evaluate performance across 900 clinically diverse queries spanning nine ethical domains, measuring convergence efficiency, ethical violation reduction, and domain-specific risk behavior. Results demonstrate that DeepSeek R1 achieves faster convergence (mean 2.34 vs. 2.67 iterations), while Med-PaLM shows superior handling of privacy-sensitive scenarios. The iterative multi-agent loop achieved an 89% reduction in ethical violations and a 92% risk downgrade rate, underscoring the effectiveness of our approach. This study presents a scalable, regulator-aligned, and cost-efficient paradigm for governing medical AI safety.

翻译：大型语言模型在医疗健康领域的应用日益广泛，然而确保其伦理完整性与安全合规性仍是临床部署的主要障碍。本研究提出一种多智能体优化框架，旨在通过结构化、迭代式的对齐机制来增强医疗大型语言模型的安全性与可靠性。该系统融合了两种生成模型——DeepSeek R1与Med-PaLM，以及两个评估智能体LLaMA 3.1与Phi-4；评估智能体依据美国医学会的医学伦理准则及五级安全风险评估协议对生成响应进行评判。我们在涵盖九个伦理领域的900项临床多样化查询中评估系统性能，测量指标包括收敛效率、伦理违规减少程度以及领域特异性风险行为。实验结果表明：DeepSeek R1实现更快的收敛速度（平均2.34次迭代 vs. 2.67次迭代），而Med-PaLM在处理隐私敏感场景方面表现更优。迭代式多智能体循环实现了89%的伦理违规减少率与92%的风险降级率，充分验证了本方法的有效性。本研究为医疗AI安全治理提供了一种可扩展、符合监管要求且成本效益显著的范式。

相关内容

关注 7103

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

《多智能体大语言模型系统的可靠决策研究》

专知会员服务

38+阅读 · 2月2日

智能体化人工智能 (Agentic AI) 的前行之路：挑战与机遇

专知会员服务

41+阅读 · 1月8日

AI 智能体系统：体系架构、应用场景及评估范式

专知会员服务

66+阅读 · 1月6日

智能体引领未来：多智能体推荐系统的定义、视角与开放挑战

专知会员服务

33+阅读 · 2025年7月5日