Text-to-Image generation models have revolutionized the artwork design process and enabled anyone to create high-quality images by entering text descriptions called prompts. Creating a high-quality prompt that consists of a subject and several modifiers can be time-consuming and costly. In consequence, a trend of trading high-quality prompts on specialized marketplaces has emerged. In this paper, we propose a novel attack, namely prompt stealing attack, which aims to steal prompts from generated images by text-to-image generation models. Successful prompt stealing attacks direct violate the intellectual property and privacy of prompt engineers and also jeopardize the business model of prompt trading marketplaces. We first perform a large-scale analysis on a dataset collected by ourselves and show that a successful prompt stealing attack should consider a prompt's subject as well as its modifiers. We then propose the first learning-based prompt stealing attack, PromptStealer, and demonstrate its superiority over two baseline methods quantitatively and qualitatively. We also make some initial attempts to defend PromptStealer. In general, our study uncovers a new attack surface in the ecosystem created by the popular text-to-image generation models. We hope our results can help to mitigate the threat. To facilitate research in this field, we will share our dataset and code with the community.
翻译:文本到图像生成模型革新了艺术作品设计流程,使任何人都能通过输入称为提示词的文本描述来生成高质量图像。创建包含主题和多个修饰词的高质量提示词既耗时又昂贵。因此,在专门市场上交易高质量提示词的趋势应运而生。本文提出一种新型攻击——提示词窃取攻击,旨在通过文本到图像生成模型生成的图像窃取提示词。成功的提示词窃取攻击不仅直接侵犯提示词工程师的知识产权和隐私,还危及提示词交易市场的商业模式。我们首先对自收集数据集进行大规模分析,表明成功的提示词窃取攻击需同时考虑提示词的主题及其修饰词。随后提出首个基于学习的提示词窃取攻击方法PromptStealer,并通过定量与定性方法证明其优于两种基线方法。我们还对防御PromptStealer进行了初步尝试。总体而言,本研究揭示了热门文本到图像生成模型生态系统中新出现的攻击面。希望我们的成果有助于缓解此类威胁。为促进该领域研究,我们将向社区共享数据集与代码。