Psychiatric narratives encode patient identity not only through explicit identifiers but also through idiosyncratic life events embedded in their clinical structure. Existing de-identification approaches, including PHI masking and LLM-based synthetic rewriting, operate at the text level and offer limited control over which semantic elements are preserved or altered. We introduce Anonpsy, a de-identification framework that reformulates the task as graph-guided semantic rewriting. Anonpsy (1) converts each narrative into a semantic graph encoding clinical entities, temporal anchors, and typed relations; (2) applies graph-constrained perturbations that modify identifying context while preserving clinically essential structure; and (3) regenerates text via graph-conditioned LLM generation. Evaluated on 90 clinician-authored psychiatric case narratives, Anonpsy preserves diagnostic fidelity while achieving consistently low re-identification risk under expert, semantic, and GPT-5-based evaluations. Compared with a strong LLM-only rewriting baseline, Anonpsy yields substantially lower semantic similarity and identifiability. These results demonstrate that explicit structural representations combined with constrained generation provide an effective approach to de-identification for psychiatric narratives.
翻译:精神病学叙事不仅通过显式标识符编码患者身份,还通过嵌入其临床结构中的独特生活事件进行编码。现有的去标识化方法,包括受保护健康信息掩码和基于大语言模型的合成重写,均在文本层面操作,对保留或改变哪些语义元素的控制有限。我们提出了Anonpsy,这是一个将去标识化任务重新定义为图引导语义重写的框架。Anonpsy(1)将每个叙事转换为编码临床实体、时间锚点和类型化关系的语义图;(2)应用图约束扰动,修改标识性上下文同时保留临床必需的结构;(3)通过图条件化的大语言模型生成重新生成文本。在90篇临床医生撰写的精神病学案例叙事上的评估表明,Anonpsy在专家评估、语义评估和基于GPT-5的评估下,保持了诊断保真度,同时实现了持续较低的重标识风险。与一个强大的纯大语言模型重写基线相比,Anonpsy产生了显著更低的语义相似性和可标识性。这些结果表明,显式结构表示与约束生成相结合,为精神病学叙事的去标识化提供了一种有效方法。