Red-Teaming Claude Opus and ChatGPT-based Security Advisors for Trusted Execution Environments

Trusted Execution Environments (TEEs) (e.g., Intel SGX and ArmTrustZone) aim to protect sensitive computation from a compromised operating system, yet real deployments remain vulnerable to microarchitectural leakage, side-channel attacks, and fault injection. In parallel, security teams increasingly rely on Large Language Model (LLM) assistants as security advisors for TEE architecture review, mitigation planning, and vulnerability triage. This creates a socio-technical risk surface: assistants may hallucinate TEE mechanisms, overclaim guarantees (e.g., what attestation does and does not establish), or behave unsafely under adversarial prompting. We present a red-teaming study of two prevalently deployed LLM assistants in the role of TEE security advisors: ChatGPT-5.2 and Claude Opus-4.6, focusing on the inherent limitations and transferability of prompt-induced failures across LLMs. We introduce TEE-RedBench, a TEE-grounded evaluation methodology comprising (i) a TEE-specific threat model for LLM-mediated security work, (ii) a structured prompt suite spanning SGX and TrustZone architecture, attestation and key management, threat modeling, and non-operational mitigation guidance, along with policy-bound misuse probes, and (iii) an annotation rubric that jointly measures technical correctness, groundedness, uncertainty calibration, refusal quality, and safe helpfulness. We find that some failures are not purely idiosyncratic, transferring up to 12.02% across LLM assistants, and we connect these outcomes to secure architecture by outlining an "LLM-in-the-loop" evaluation pipeline: policy gating, retrieval grounding, structured templates, and lightweight verification checks that, when combined, reduce failures by 80.62%.

翻译：可信执行环境（例如Intel SGX与Arm TrustZone）旨在保护敏感计算免受已遭破坏的操作系统影响，然而实际部署仍易受微架构泄漏、侧信道攻击及故障注入的影响。与此同时，安全团队日益依赖大型语言模型助手作为TEE架构审查、缓解方案制定及漏洞分诊的安全顾问。这形成了一个社会技术风险面：助手可能对TEE机制产生幻觉、过度宣称保障能力（例如证明机制能确立与不能确立的内容），或在对抗性提示下表现出不安全行为。我们针对两款广泛部署的、担任TEE安全顾问角色的LLM助手——ChatGPT-5.2与Claude Opus-4.6——开展了一项红队测试研究，重点关注提示诱导故障在LLM间的固有局限性与可迁移性。我们提出了TEE-RedBench，一种基于TEE的评估方法，包含：（i）针对LLM介导安全工作的TEE专用威胁模型；（ii）一套覆盖SGX与TrustZone架构、证明与密钥管理、威胁建模以及非操作性缓解指导的结构化提示集，连同政策约束的滥用探测；（iii）一项联合评估技术正确性、事实依据性、不确定性校准、拒绝质量与安全助益性的标注准则。我们发现部分故障并非纯粹特异的，在LLM助手间的迁移率最高可达12.02%，并通过概述一个“LLM在环”评估流程——策略门控、检索增强、结构化模板及轻量级验证检查——将这些结果与安全架构联系起来，该组合方案可将故障减少80.62%。