Large Language Models (LLMs) are increasingly applied in healthcare, yet ensuring their ethical integrity and safety compliance remains a major barrier to clinical deployment. This work introduces a multi-agent refinement framework designed to enhance the safety and reliability of medical LLMs through structured, iterative alignment. Our system combines two generative models - DeepSeek R1 and Med-PaLM - with two evaluation agents, LLaMA 3.1 and Phi-4, which assess responses using the American Medical Association's (AMA) Principles of Medical Ethics and a five-tier Safety Risk Assessment (SRA-5) protocol. We evaluate performance across 900 clinically diverse queries spanning nine ethical domains, measuring convergence efficiency, ethical violation reduction, and domain-specific risk behavior. Results demonstrate that DeepSeek R1 achieves faster convergence (mean 2.34 vs. 2.67 iterations), while Med-PaLM shows superior handling of privacy-sensitive scenarios. The iterative multi-agent loop achieved an 89% reduction in ethical violations and a 92% risk downgrade rate, underscoring the effectiveness of our approach. This study presents a scalable, regulator-aligned, and cost-efficient paradigm for governing medical AI safety.
翻译:大型语言模型在医疗健康领域的应用日益广泛,然而确保其伦理完整性与安全合规性仍是临床部署的主要障碍。本研究提出一种多智能体优化框架,旨在通过结构化、迭代式的对齐机制来增强医疗大型语言模型的安全性与可靠性。该系统融合了两种生成模型——DeepSeek R1与Med-PaLM,以及两个评估智能体LLaMA 3.1与Phi-4;评估智能体依据美国医学会的医学伦理准则及五级安全风险评估协议对生成响应进行评判。我们在涵盖九个伦理领域的900项临床多样化查询中评估系统性能,测量指标包括收敛效率、伦理违规减少程度以及领域特异性风险行为。实验结果表明:DeepSeek R1实现更快的收敛速度(平均2.34次迭代 vs. 2.67次迭代),而Med-PaLM在处理隐私敏感场景方面表现更优。迭代式多智能体循环实现了89%的伦理违规减少率与92%的风险降级率,充分验证了本方法的有效性。本研究为医疗AI安全治理提供了一种可扩展、符合监管要求且成本效益显著的范式。