Hierarchy-Aware Multimodal Unlearning for Medical AI

Pretrained Multimodal Large Language Models (MLLMs) are increasingly used in sensitive domains such as medical AI, where privacy regulations like HIPAA and GDPR require specific removal of individuals' or institutions' data. This motivates machine unlearning, which aims to remove the influence of target data from a trained model. However, existing unlearning benchmarks fail to reflect the hierarchical and multimodal structure of real-world medical data, limiting their ability to properly evaluate unlearning in practice. Therefore, we introduce MedForget, a hierarchy-aware multimodal unlearning benchmark that models hospital data as a nested structure, enabling fine-grained evaluation of multimodal unlearning across retain and forget splits. Experiments with current unlearning methods show that existing approaches struggle to achieve effective hierarchy-aware forgetting without degrading downstream medical utility. To address this limitation, we propose Cross-modal Hierarchy-Informed Projection for unlearning (CHIP), a training-free, hierarchy-aware multimodal unlearning method that deletes information by selectively removing target-specific weight subspaces while preserving sibling-shared information. Experiments show that CHIP achieves the highest forget-retain performance gap across all hierarchy levels while maintaining competitive downstream utility compared to existing methods. Overall, MedForget provides a practical, HIPAA-aligned benchmark for evaluating structured multimodal unlearning for medical data, and CHIP offers an effective and general solution for hierarchy-aware forgetting that balances deletion with utility.

翻译：预训练多模态大语言模型（MLLMs）在医疗AI等敏感领域日益普及，而HIPAA和GDPR等隐私法规要求对特定个人或机构数据进行定向删除。这推动了机器遗忘技术的发展，其目标是从已训练模型中消除目标数据的影响。然而，现有遗忘基准未能反映真实医疗数据的层次化多模态结构，限制了其在实际场景中对遗忘效果的评估能力。为此，我们提出MedForget——一个层次感知多模态遗忘基准，该基准将医院数据建模为嵌套结构，支持跨保留集与遗忘集的多模态遗忘细粒度评估。通过对现有遗忘方法的实验表明，当前方法难以在不降低下游医疗效用的前提下实现有效的层次感知遗忘。为突破此局限，我们提出跨模态层次信息投影遗忘法（CHIP），这是一种免训练的层次感知多模态遗忘方法，通过选择性移除目标特定权重子空间并保留兄弟节点共享信息来实现数据删除。实验证明，CHIP在所有层次级别上均取得最高的遗忘-保留性能差距，同时相较于现有方法保持竞争力的下游效用。总体而言，MedForget为评估医疗数据的结构化多模态遗忘提供了符合HIPAA标准的实用基准，而CHIP则为平衡删除与效用的层次感知遗忘提供了高效通用解决方案。