Large language models (LLMs) exhibit exceptional performance but pose substantial privacy risks due to training data memorization, particularly within healthcare contexts involving imperfect or privacy-sensitive patient information. We present a hierarchical dual-strategy framework for selective knowledge unlearning that precisely removes specialized knowledge while preserving fundamental medical competencies. Our approach synergistically integrates geometric-constrained gradient updates to selectively modulate target parameters with concept-aware token-level interventions that distinguish between preservation-critical and unlearning-targeted tokens via a unified four-level medical concept hierarchy. Comprehensive evaluations on the MedMCQA (surgical) and MHQA (anxiety, depression, trauma) datasets demonstrate superior performance, achieving an 82.7% forgetting rate and 88.5% knowledge preservation. Notably, our framework maintains robust privacy guarantees while requiring modification of only 0.1% of parameters, addressing critical needs for regulatory compliance, auditability, and ethical standards in clinical research.
翻译:大型语言模型(LLM)展现出卓越性能,但由于训练数据记忆特性,尤其在涉及不完善或隐私敏感患者信息的医疗场景中,其隐私风险显著。本文提出一种分层双策略框架,用于选择性知识遗忘,该框架能够精确移除特定专业知识,同时保留基础医学能力。我们的方法协同整合几何约束梯度更新与概念感知的令牌级干预:前者选择性调节目标参数,后者通过统一的四级医学概念层次结构,区分需保留的关键令牌与待遗忘的目标令牌。在MedMCQA(外科)和MHQA(焦虑、抑郁、创伤)数据集上的综合评估表明,该方法实现了82.7%的遗忘率与88.5%的知识保留率,性能优异。值得注意的是,本框架仅需修改0.1%的参数即可保持强大的隐私保障能力,满足临床研究中法规遵从性、可审计性及伦理标准的关键需求。