Advanced Persistent Threats (APTs) are stealthy, multi-stage attacks that require adaptive and timely defense. While deep reinforcement learning (DRL) enables autonomous cyber defense, its decisions are often opaque and difficult to trust in operational environments. This paper presents DeepXplain, an explainable DRL framework for stage-aware APT defense. Building on our prior DeepStage model, DeepXplain integrates provenance-based graph learning, temporal stage estimation, and a unified XAI pipeline that provides structural, temporal, and policy-level explanations. Unlike post-hoc methods, explanation signals are incorporated directly into policy optimization through evidence alignment and confidence-aware reward shaping. To the best of our knowledge, DeepXplain is the first framework to integrate explanation signals into reinforcement learning for APT defense. Experiments in a realistic enterprise testbed show improvements in stage-weighted F1-score (0.887 to 0.915) and success rate (84.7% to 89.6%), along with higher explanation confidence (0.86), improved fidelity (0.79), and more compact explanations (0.31). These results demonstrate enhanced effectiveness and trustworthiness of autonomous cyber defense.
翻译:高级持续性威胁(APT)作为隐蔽的多阶段攻击,需要具备自适应性与时效性的防御手段。尽管深度强化学习(DRL)能够实现自主网络防御,但其决策过程在运行环境中往往缺乏可解释性且难以建立信任。本文提出DeepXplain——一种面向阶段感知型APT防御的可解释深度强化学习框架。该框架基于我们先前提出的DeepStage模型,融合了基于溯源关系的图学习、时序阶段估计与统一的可解释人工智能(XAI)流水线,可提供结构级、时序级与策略级的多维解释。与事后解释方法不同,DeepXplain通过证据对齐与置信度感知奖励塑形机制,将解释信号直接纳入策略优化过程。据我们所知,DeepXplain是首个将解释信号融入强化学习进行APT防御的框架。在真实企业测试环境中的实验表明:阶段加权F1分数从0.887提升至0.915,成功率从84.7%提升至89.6%,解释置信度达0.86,保真度达0.79,且解释复杂度降至0.31。这些结果验证了自主网络防御系统在效能与可信度方面的双重提升。