Large language models (LLMs) have profoundly transformed natural language applications, with a growing reliance on instruction-based definitions for designing chatbots. However, post-deployment the chatbot definitions are fixed and are vulnerable to attacks by malicious users, emphasizing the need to prevent unethical applications and financial losses. Existing studies explore user prompts' impact on LLM-based chatbots, yet practical methods to contain attacks on application-specific chatbots remain unexplored. This paper presents System Prompt Meta Language (SPML), a domain-specific language for refining prompts and monitoring the inputs to the LLM-based chatbots. SPML actively checks attack prompts, ensuring user inputs align with chatbot definitions to prevent malicious execution on the LLM backbone, optimizing costs. It also streamlines chatbot definition crafting with programming language capabilities, overcoming natural language design challenges. Additionally, we introduce a groundbreaking benchmark with 1.8k system prompts and 20k user inputs, offering the inaugural language and benchmark for chatbot definition evaluation. Experiments across datasets demonstrate SPML's proficiency in understanding attacker prompts, surpassing models like GPT-4, GPT-3.5, and LLAMA. Our data and codes are publicly available at: https://prompt-compiler.github.io/SPML/.
翻译:大语言模型(LLM)已深刻变革自然语言应用,基于指令定义的聊天机器人设计方法日益普及。然而,部署后聊天机器人定义固定不变,易受恶意用户攻击,亟需防范不道德应用与经济损失。现有研究探讨用户提示对基于LLM的聊天机器人的影响,但针对应用特定型聊天机器人攻击的实用遏制方法仍属空白。本文提出系统提示元语言(SPML),这是一种用于优化提示并监控基于LLM的聊天机器人输入的领域特定语言。SPML主动检测攻击提示,确保用户输入与聊天机器人定义一致,在LLM主干上阻止恶意执行并优化成本。该语言还通过编程语言能力简化聊天机器人定义构建过程,克服自然语言设计挑战。此外,我们引入包含1800个系统提示与2万条用户输入的突破性基准,为聊天机器人定义评估提供首个专用语言与基准。跨数据集的实验表明,SPML在理解攻击者提示方面超越GPT-4、GPT-3.5及LLAMA等模型。数据与代码已公开于:https://prompt-compiler.github.io/SPML/。