LAAF: Logic-layer Automated Attack Framework A Systematic Red-Teaming Methodology for LPCI Vulnerabilities in Agentic Large Language Model Systems

Hammad Atta,Ken Huang,Kyriakos Rock Lambros,Yasir Mehmood,Zeeshan Baig,Mohamed Abdur Rahman,Manish Bhatt,M. Aziz Ul Haq,Muhammad Aatif,Nadeem Shahzad,Kamal Noor,Vineeth Sai Narajala,Hazem Ali,Jamel Abed

Agentic LLM systems equipped with persistent memory, RAG pipelines, and external tool connectors face a class of attacks - Logic-layer Prompt Control Injection (LPCI) - for which no automated red-teaming instrument existed. We present LAAF (Logic-layer Automated Attack Framework), the first automated red-teaming framework to combine an LPCI-specific technique taxonomy with stage-sequential seed escalation - two capabilities absent from existing tools: Garak lacks memory-persistence and cross-session triggering; PyRIT supports multi-turn testing but treats turns independently, without seeding each stage from the prior breakthrough. LAAF provides: (i) a 49-technique taxonomy spanning six attack categories (Encoding~11, Structural~8, Semantic~8, Layered~5, Trigger~12, Exfiltration~5; see Table 1), combinable across 5 variants per technique and 6 lifecycle stages, yielding a theoretical maximum of 2,822,400 unique payloads ($49 \times 5 \times 1{,}920 \times 6$; SHA-256 deduplicated at generation time); and (ii) a Persistent Stage Breaker (PSB) that drives payload mutation stage-by-stage: on each breakthrough, the PSB seeds the next stage with a mutated form of the winning payload, mirroring real adversarial escalation. Evaluation on five production LLM platforms across three independent runs demonstrates that LAAF achieves higher stage-breakthrough efficiency than single-technique random testing, with a mean aggregate breakthrough rate of 84\% (range 83--86\%) and platform-level rates stable within 17 percentage points across runs. Layered combinations and semantic reframing are the highest-effectiveness technique categories, with layered payloads outperforming encoding on well-defended platforms.

翻译：配备持久化记忆、RAG流水线与外部工具连接器的智能体大语言模型系统面临一类新型攻击——逻辑层提示控制注入（LPCI），而此前尚无针对此类漏洞的自动化红队测试工具。本文提出LAAF（逻辑层自动化攻击框架），这是首个将LPCI专用技术分类体系与阶段递进式种子升级相结合的自动化红队测试框架，其具备现有工具所缺失的两项核心能力：Garak缺乏持久化记忆与跨会话触发机制；PyRIT虽支持多轮测试但将各轮次独立处理，未能基于前序突破点生成后续阶段的测试种子。LAAF提供：（1）涵盖六类攻击维度（编码类~11、结构类~8、语义类~8、分层类~5、触发类~12、渗出类~5；详见表1）的49项技术分类体系，每项技术可结合5种变体与6个生命周期阶段，理论最多可生成2,822,400种独特载荷（$49 \times 5 \times 1{,}920 \times 6$；生成阶段通过SHA-256去重）；（2）持久化阶段突破器（PSB）实现载荷的逐阶段演化：每次突破后，PSB会将获胜载荷的变异形式作为下一阶段的测试种子，模拟真实攻击的升级过程。在五个生产级LLM平台上进行的三轮独立评估表明，相较于单技术随机测试，LAAF具有更高的阶段突破效率，平均综合突破率达84%（区间83-86%），各平台突破率在三次实验中的波动幅度稳定在17个百分点内。分层组合与语义重构是最高效的技术类别，在防御完善的平台上分层载荷的表现显著优于编码类攻击。