We introduce LLMmap, a first-generation fingerprinting attack targeted at LLM-integrated applications. LLMmap employs an active fingerprinting approach, sending carefully crafted queries to the application and analyzing the responses to identify the specific LLM model in use. With as few as 8 interactions, LLMmap can accurately identify LLMs with over 95% accuracy. More importantly, LLMmap is designed to be robust across different application layers, allowing it to identify LLMs operating under various system prompts, stochastic sampling hyperparameters, and even complex generation frameworks such as RAG or Chain-of-Thought.
翻译:本文提出LLMmap,一种针对集成大型语言模型(LLM)应用的第一代指纹识别攻击方法。LLMmap采用主动式指纹识别策略,通过向目标应用发送精心构造的查询请求,并分析响应结果以识别实际部署的特定LLM模型。仅需8次交互,LLMmap即可实现超过95%的准确识别率。更重要的是,LLMmap设计具备跨应用层级的鲁棒性,能够有效识别在不同系统提示词、随机采样超参数下运行的LLM,甚至可应对RAG(检索增强生成)或思维链等复杂生成框架的识别场景。