ISACL：面向受版权保护训练数据泄露的内部状态分析器 (ISACL: Internal State Analyzer for Copyrighted Training Data Leakage)

Large Language Models (LLMs) have revolutionized Natural Language Processing (NLP) but pose risks of inadvertently exposing copyrighted or proprietary data, especially when such data is used for training but not intended for distribution. Traditional methods address these leaks only after content is generated, which can lead to the exposure of sensitive information. This study introduces a proactive approach: examining LLMs' internal states before text generation to detect potential leaks. By using a curated dataset of copyrighted materials, we trained a neural network classifier to identify risks, allowing for early intervention by stopping the generation process or altering outputs to prevent disclosure. Integrated with a Retrieval-Augmented Generation (RAG) system, this framework ensures adherence to copyright and licensing requirements while enhancing data privacy and ethical standards. Our results show that analyzing internal states effectively mitigates the risk of copyrighted data leakage, offering a scalable solution that fits smoothly into AI workflows, ensuring compliance with copyright regulations while maintaining high-quality text generation. The implementation is available on GitHub.\footnote{https://github.com/changhu73/Internal_states_leakage}

翻译：大型语言模型（LLMs）已彻底变革了自然语言处理（NLP）领域，但也带来了无意中暴露受版权保护或专有数据的风险，尤其是在此类数据被用于训练但并非旨在分发的情况下。传统方法仅在内容生成后才处理这些泄露问题，这可能导致敏感信息暴露。本研究提出了一种主动式方法：在文本生成前检查LLMs的内部状态以检测潜在泄露。通过使用一个精心构建的受版权保护材料数据集，我们训练了一个神经网络分类器来识别风险，从而允许通过停止生成过程或修改输出来进行早期干预，以防止信息泄露。该框架与检索增强生成（RAG）系统集成，确保遵守版权和许可要求，同时增强了数据隐私和伦理标准。我们的结果表明，分析内部状态能有效降低受版权数据泄露的风险，提供了一个可扩展的解决方案，能够无缝融入AI工作流程，在保持高质量文本生成的同时确保符合版权法规。相关实现已在GitHub上开源。\footnote{https://github.com/changhu73/Internal_states_leakage}