Prompt injection remains a central obstacle to the safe deployment of large language models, particularly in multi-agent settings where intermediate outputs can propagate or amplify malicious instructions. Building on earlier work that introduced a four-metric Total Injection Vulnerability Score (TIVS), this paper extends the evaluation framework with semantic similarity-based caching and a fifth metric (Observability Score Ratio) to yield TIVS-O, investigating how defence effectiveness interacts with transparency in a HOPE-inspired Nested Learning architecture. The proposed system combines an agentic pipeline with Continuum Memory Systems that implement semantic similarity-based caching across 301 synthetically generated injection-focused prompts drawn from ten attack families, while a fourth agent performs comprehensive security analysis using five key performance indicators. In addition to traditional injection metrics, OSR quantifies the richness and clarity of security-relevant reasoning exposed by each agent, enabling an explicit analysis of trade-offs between strict mitigation and auditability. Experiments show that the system achieves secure responses with zero high-risk breaches, while semantic caching delivers substantial computational savings, achieving a 41.6% reduction in LLM calls and corresponding decreases in latency, energy consumption, and carbon emissions. Five TIVS-O configurations reveal optimal trade-offs between mitigation strictness and forensic transparency. These results indicate that observability-aware evaluation can reveal non-monotonic effects within multi-agent pipelines and that memory-augmented agents can jointly maximize security robustness, real-time performance, operational cost savings, and environmental sustainability without modifying underlying model weights, providing a production-ready pathway for secure and green LLM deployments.
翻译:提示注入攻击仍然是安全部署大语言模型的核心障碍,尤其在多智能体场景中,中间输出可能传播或放大恶意指令。本文在早期提出四指标总注入脆弱性评分(TIVS)的研究基础上,通过引入基于语义相似性的缓存机制和第五项指标(可观测性评分比)扩展评估框架,形成TIVS-O体系,并在受HOPE启发的嵌套学习架构中探究防御效能与透明度的交互机制。所提出的系统将智能体流水线与连续记忆系统相结合,在涵盖十个攻击家族的301个合成生成的注入攻击提示上实施基于语义相似性的缓存,同时第四智能体使用五项关键性能指标进行全面的安全分析。除传统注入指标外,OSR量化了各智能体暴露的安全相关推理的丰富度与清晰度,从而实现对严格缓解与可审计性之间权衡关系的显式分析。实验表明,该系统在实现零高风险漏洞的安全响应同时,语义缓存机制带来显著的计算节约——大语言模型调用减少41.6%,并相应降低了延迟、能耗与碳排放。五种TIVS-O配置揭示了缓解严格性与取证透明度之间的最优权衡。这些结果表明:具备可观测性的评估能揭示多智能体流水线内的非单调效应;记忆增强型智能体可在不修改底层模型权重的情况下,协同实现安全鲁棒性、实时性能、运营成本节约与环境可持续性的最大化,为安全绿色的LLM部署提供了生产就绪的解决方案。