Medical language models (LMs) can memorize and reproduce protected health information, but privacy evaluations often focus on recovery of training text rather than disclosure under realistic threat models. We introduce a clinically grounded framework that evaluates leakage along a graded axis of adversarial access, ranging from publicly inferable demographics to leaked note fragments. At each tier, we measure verbatim memorization of patient-specific text and semantic leakage of sensitive diagnoses. Applying the framework to an LM pretrained on 378k clinical notes, we find that routine encounter metadata (i.e. name, date of birth, provider, practice, visit date) elicits high rates of verbatim memorization across a patient's timeline and sensitive-diagnosis recovery (AUROC 0.91 for abortion, 0.81 for HIV). At the same time, exact-match memorization can overstate disclosure: 36% of memorized tokens reflect templated documentation. Our work highlights the risks of training on longitudinal clinical data, providing a practical framework for contextual privacy evaluation of medical LMs.
翻译:医疗语言模型可能记忆并重现受保护的健康信息,但现有隐私评估多聚焦于训练文本的恢复能力,而非在真实威胁模型下的数据泄露风险。我们提出一个临床可落地的评估框架,沿敌手访问权限的分级轴线(从可公开推断的人口统计学信息到泄露的病程记录片段)度量隐私泄露风险。在每个层级中,我们分别测量患者特异性文本的逐字记忆程度及敏感诊断的语义泄露。将该框架应用于基于37.8万份临床笔记预训练的医疗语言模型时,我们发现常规就诊元数据(如姓名、出生日期、医疗服务提供者、执业机构、就诊日期)即可在患者时间线维度引发高概率逐字记忆,并实现敏感诊断的恢复(人工流产AUROC 0.91,HIV 0.81)。同时,精确匹配式记忆可能高估泄露风险:36%的记忆化符元实际来自模板化病历文书。本研究揭示了基于纵向临床数据训练的风险,为医疗语言模型的上下文隐私评估提供了实用框架。