Large Language Models (LLMs) are increasingly adopted in healthcare to support clinical decision-making, summarize electronic health records (EHRs), and enhance patient care. However, this integration introduces significant privacy and security challenges, driven by the sensitivity of clinical data and the high-stakes nature of medical workflows. These risks become even more pronounced across heterogeneous deployment environments, ranging from small on-premise hospital systems to regional health networks, each with unique resource limitations and regulatory demands. This Systematization of Knowledge (SoK) examines the evolving threat landscape across the three core LLM phases: Data preprocessing, Fine-tuning, and Inference within realistic healthcare settings. We present a detailed threat model that characterizes adversaries, capabilities, and attack surfaces at each phase, and we systematize how existing privacy-preserving techniques (PPTs) attempt to mitigate these vulnerabilities. While existing defenses show promise, our analysis identifies persistent limitations in securing sensitive clinical data across diverse operational tiers. We conclude with phase-aware recommendations and future research directions aimed at strengthening privacy guarantees for LLMs in regulated environments. This work provides a foundation for understanding the intersection of LLMs, threats, and privacy in healthcare, offering a roadmap toward more robust and clinically trustworthy AI systems.
翻译:大型语言模型(LLMs)在医疗保健领域日益普及,用于支持临床决策、总结电子健康记录(EHRs)以及提升患者护理水平。然而,由于临床数据的敏感性和医疗工作流程的高风险性,这种整合带来了重大的隐私与安全挑战。这些风险在异构部署环境中(从本地小型医院系统到区域健康网络,每个环境都具有独特的资源限制和监管要求)变得尤为突出。本知识系统化研究(SoK)考察了现实医疗场景中LLM三个核心阶段(数据预处理、微调与推理)不断演变的威胁态势。我们提出了一个详细的威胁模型,描述了每个阶段的对手特征、能力与攻击面,并系统化分析了现有隐私保护技术(PPTs)如何尝试缓解这些漏洞。尽管现有防御措施显示出潜力,但我们的分析指出,在不同操作层级上保护敏感临床数据仍存在持续性的局限。最后,我们提出了针对各阶段的建议与未来研究方向,旨在加强受监管环境中LLMs的隐私保障。本工作为理解医疗领域LLMs、威胁与隐私的交叉点奠定了基础,为构建更稳健且具有临床可信度的人工智能系统提供了路线图。