Towards Effective Prompt Stealing Attack against Text-to-Image Diffusion Models

Text-to-Image (T2I) models, represented by DALL$\cdot$E and Midjourney, have gained huge popularity for creating realistic images. The quality of these images relies on the carefully engineered prompts, which have become valuable intellectual property. While skilled prompters showcase their AI-generated art on markets to attract buyers, this business incidentally exposes them to \textit{prompt stealing attacks}. Existing state-of-the-art attack techniques reconstruct the prompts from a fixed set of modifiers (i.e., style descriptions) with model-specific training, which exhibit restricted adaptability and effectiveness to diverse showcases (i.e., target images) and diffusion models. To alleviate these limitations, we propose Prometheus, a training-free, proxy-in-the-loop, search-based prompt-stealing attack, which reverse-engineers the valuable prompts of the showcases by interacting with a local proxy model. It consists of three innovative designs. First, we introduce dynamic modifiers, as a supplement to static modifiers used in prior works. These dynamic modifiers provide more details specific to the showcases, and we exploit NLP analysis to generate them on the fly. Second, we design a contextual matching algorithm to sort both dynamic and static modifiers. This offline process helps reduce the search space of the subsequent step. Third, we interact with a local proxy model to invert the prompts with a greedy search algorithm. Based on the feedback guidance, we refine the prompt to achieve higher fidelity. The evaluation results show that Prometheus successfully extracts prompts from popular platforms like PromptBase and AIFrog against diverse victim models, including Midjourney, Leonardo.ai, and DALL$\cdot$E, with an ASR improvement of 25.0\%. We also validate that Prometheus is resistant to extensive potential defenses, further highlighting its severity in practice.

翻译：以DALL$\cdot$E和Midjourney为代表的文本到图像（T2I）模型因其能够生成逼真图像而广受欢迎。这些图像的质量依赖于精心设计的提示词，这些提示词已成为宝贵的知识产权。当熟练的提示工程师在市场上展示其AI生成的艺术作品以吸引买家时，这种商业模式无意中使他们暴露于\textit{提示词窃取攻击}之下。现有的最先进攻击技术通过针对特定模型的训练，从一个固定的修饰词集合（即风格描述）中重构提示词，这些方法在面对多样化的展示作品（即目标图像）和扩散模型时，其适应性和有效性受到限制。为了缓解这些局限性，我们提出了Prometheus，一种免训练、代理模型在环、基于搜索的提示词窃取攻击方法。该方法通过与一个本地代理模型交互，对展示作品的宝贵提示词进行逆向工程。它包含三项创新设计。首先，我们引入了动态修饰词，作为对先前工作中使用的静态修饰词的补充。这些动态修饰词提供了更多针对展示作品的具体细节，我们利用自然语言处理分析技术动态生成它们。其次，我们设计了一种上下文匹配算法来对动态和静态修饰词进行排序。这个离线过程有助于减少后续步骤的搜索空间。第三，我们通过与一个本地代理模型交互，使用贪心搜索算法来逆向推导提示词。基于反馈指导，我们优化提示词以获得更高的保真度。评估结果表明，Prometheus成功地从PromptBase和AIFrog等流行平台提取了针对多种受害模型（包括Midjourney、Leonardo.ai和DALL$\cdot$E）的提示词，攻击成功率提升了25.0\%。我们还验证了Prometheus能够抵抗多种潜在的防御措施，进一步凸显了其在实践中的严重性。