模式增强的多轮越狱攻击：利用大型语言模型中的结构漏洞 (Pattern Enhanced Multi-Turn Jailbreaking: Exploiting Structural Vulnerabilities in Large Language Models)

Large language models (LLMs) remain vulnerable to multi-turn jailbreaking attacks that exploit conversational context to bypass safety constraints gradually. These attacks target different harm categories (like malware generation, harassment, or fraud) through distinct conversational approaches (educational discussions, personal experiences, hypothetical scenarios). Existing multi-turn jailbreaking methods often rely on heuristic or ad hoc exploration strategies, providing limited insight into underlying model weaknesses. The relationship between conversation patterns and model vulnerabilities across harm categories remains poorly understood. We propose Pattern Enhanced Chain of Attack (PE-CoA), a framework of five conversation patterns to construct effective multi-turn jailbreaks through natural dialogue. Evaluating PE-CoA on twelve LLMs spanning ten harm categories, we achieve state-of-the-art performance, uncovering pattern-specific vulnerabilities and LLM behavioral characteristics: models exhibit distinct weakness profiles where robustness to one conversational pattern does not generalize to others, and model families share similar failure modes. These findings highlight limitations of safety training and indicate the need for pattern-aware defenses. Code available on: https://github.com/Ragib-Amin-Nihal/PE-CoA

翻译：大型语言模型（LLMs）仍然容易受到多轮越狱攻击，此类攻击利用对话上下文逐步绕过安全约束。这些攻击通过不同的对话策略（如教育讨论、个人经历、假设场景）针对不同的危害类别（如恶意软件生成、骚扰或欺诈）。现有的多轮越狱方法通常依赖启发式或临时探索策略，对底层模型弱点的洞察有限。对话模式与跨危害类别的模型漏洞之间的关系仍不清楚。我们提出了模式增强攻击链（PE-CoA），这是一个包含五种对话模式的框架，用于通过自然对话构建有效的多轮越狱。通过在涵盖十个危害类别的十二个LLMs上评估PE-CoA，我们实现了最先进的性能，揭示了模式特定的漏洞和LLM行为特征：模型表现出不同的弱点分布，对一种对话模式的鲁棒性无法推广到其他模式，且模型家族共享相似的失败模式。这些发现凸显了安全训练的局限性，并表明需要模式感知的防御机制。代码发布于：https://github.com/Ragib-Amin-Nihal/PE-CoA

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/