LLMs have shown great capabilities in various tasks but also exhibited memorization of training data, thus raising tremendous privacy and copyright concerns. While prior work has studied memorization during pre-training, the exploration of memorization during fine-tuning is rather limited. Compared with pre-training, fine-tuning typically involves sensitive data and diverse objectives, thus may bring unique memorization behaviors and distinct privacy risks. In this work, we conduct the first comprehensive analysis to explore LMs' memorization during fine-tuning across tasks. Our studies with open-sourced and our own fine-tuned LMs across various tasks indicate that fine-tuned memorization presents a strong disparity among tasks. We provide an understanding of this task disparity via sparse coding theory and unveil a strong correlation between memorization and attention score distribution. By investigating its memorization behavior, multi-task fine-tuning paves a potential strategy to mitigate fine-tuned memorization.
翻译:大语言模型在各类任务中展现出强大能力的同时,也表现出对训练数据的记忆现象,从而引发了严重的隐私与版权问题。虽已有研究探讨预训练阶段的记忆机制,但针对微调阶段记忆现象的探索仍较为有限。相较于预训练,微调通常涉及敏感数据与多样化目标,可能产生独特的记忆行为与隐私风险。本研究首次系统分析了语言模型在跨任务微调过程中的记忆现象。通过对开源模型及自行微调模型的多任务研究表明,微调记忆在不同任务间呈现显著差异性。我们借助稀疏编码理论解释了这一任务差异现象,并揭示了记忆行为与注意力分数分布之间的强相关性。进一步研究表明,多任务微调通过调节记忆行为,为缓解微调记忆问题提供了一种潜在策略。