LLM-driven automated penetration testing agents are typically evaluated against static targets that neither detect nor respond to attacks, so their behavior under intelligent defense remains untested. The causal consistency of multi-step attack chains likewise hinges on unstable LLM reasoning, and agent decisions remain opaque to human analysts. These three shortcomings, in realism, consistency, and auditability, are usually patched in isolation. We present ZERO-APT, a turn-based attacker-defender-judge framework that addresses them within a single architecture. For realism, ZERO-APT embeds a configurable LLM Defender that consumes Sysmon telemetry and detects attacks in real time, exposing the attacker to a live opponent rather than a passive target. For consistency, three architectural mechanisms move causal consistency from unstable LLM reasoning into enforced system architecture: separation of planning from execution, multi-dimensional ReAct feedback, and a hard-constraint-filtered action library. For auditability, a dedicated Judge agent adjudicates each round, maintains global state, and emits structured post-hoc CTI reports that make every decision traceable. We evaluate a Windows Server 2022 post-exploitation prototype across five scenarios with three Defender configurations. ZERO-APT reaches 79\% attack success rate (Aurora 22\%, PentestGPT 39\%), a Causal Consistency Score of 0.860 (Aurora 0.930, Claude Code 0.520), and end-to-end decision auditability through structured CTI reports. We release the benchmark to support evaluation of penetration agents under intelligent defense.
翻译:大语言模型驱动的自动化渗透测试智能体通常针对既不检测也不响应攻击的静态目标进行评估,因此其在智能防御下的行为仍属未知。多步攻击链的因果一致性同样依赖于不稳定的LLM推理,且智能体的决策对人类分析人员而言仍不透明。这三个关于真实性、一致性和可审计性的缺陷通常被孤立地修补。我们提出ZERO-APT,一个回合制的攻击者-防御者-裁判框架,在单一架构中解决了这些问题。针对真实性,ZERO-APT嵌入了一个可配置的LLM防御者,它消耗Sysmon遥测数据并实时检测攻击,使攻击者面对的是实时对手而非被动目标。针对一致性,三种架构机制将因果一致性从不稳定的LLM推理转移到强制的系统架构中:计划与执行的分离、多维ReAct反馈以及硬约束过滤的动作库。针对可审计性,专门的裁判智能体裁决每个回合,维护全局状态,并生成结构化的事后CTI报告,使每个决策都可追溯。我们在五个场景中使用三种防御者配置评估了Windows Server 2022的后渗透利用原型。ZERO-APT实现了79%的攻击成功率(Aurora为22%,PentestGPT为39%),因果一致性得分为0.860(Aurora为0.930,Claude Code为0.520),并通过结构化CTI报告实现了端到端的决策可审计性。我们发布基准测试以支持智能防御下渗透智能体的评估。