In recent times, significant advancements have been made in the field of large language models (LLMs), represented by GPT series models. To optimize task execution, users often engage in multi-round conversations with GPT models hosted in cloud environments. These multi-round conversations, potentially replete with private information, require transmission and storage within the cloud. However, this operational paradigm introduces additional attack surfaces. In this paper, we first introduce a specific Conversation Reconstruction Attack targeting GPT models. Our introduced Conversation Reconstruction Attack is composed of two steps: hijacking a session and reconstructing the conversations. Subsequently, we offer an exhaustive evaluation of the privacy risks inherent in conversations when GPT models are subjected to the proposed attack. However, GPT-4 demonstrates certain robustness to the proposed attacks. We then introduce two advanced attacks aimed at better reconstructing previous conversations, specifically the UNR attack and the PBU attack. Our experimental findings indicate that the PBU attack yields substantial performance across all models, achieving semantic similarity scores exceeding 0.60, while the UNR attack is effective solely on GPT-3.5. Our results reveal the concern about privacy risks associated with conversations involving GPT models and aim to draw the community's attention to prevent the potential misuse of these models' remarkable capabilities. We will responsibly disclose our findings to the suppliers of related large language models.
翻译:近年来,以GPT系列模型为代表的大型语言模型领域取得了显著进展。为优化任务执行,用户常与部署在云环境中的GPT模型进行多轮对话。这些可能包含私密信息的多轮对话需要在云端传输和存储。然而,这种运行模式引入了额外的攻击面。本文首先提出了一种针对GPT模型的特定对话重构攻击。我们提出的对话重构攻击由两个步骤组成:劫持会话和重构对话。随后,我们对GPT模型在遭受所提攻击时对话中固有的隐私风险进行了详尽评估。然而,GPT-4对所述攻击表现出一定的鲁棒性。接着我们引入两种旨在更好重构先前对话的高级攻击,具体为UNR攻击和PBU攻击。实验结果表明,PBU攻击在所有模型上均展现出显著性能,语义相似度得分超过0.60,而UNR攻击仅对GPT-3.5有效。我们的研究结果揭示了涉及GPT模型对话的隐私风险问题,旨在引起学界关注,防止这些模型卓越能力被潜在滥用。我们将负责任地向相关大型语言模型的供应商披露研究结果。