Advanced Persistent Threats (APTs) are stealthy, multi-stage attacks that require adaptive and timely defense. While deep reinforcement learning (DRL) enables autonomous cyber defense, its decisions are often opaque and difficult to trust in operational environments. This paper presents DeepXplain, an explainable DRL framework for stage-aware APT defense. Building on our prior DeepStage model, DeepXplain integrates provenance-based graph learning, temporal stage estimation, and a unified XAI pipeline that provides structural, temporal, and policy-level explanations. Unlike post-hoc methods, explanation signals are incorporated directly into policy optimization through evidence alignment and confidence-aware reward shaping. To the best of our knowledge, DeepXplain is the first framework to integrate explanation signals into reinforcement learning for APT defense. Experiments in a realistic enterprise testbed show improvements in stage-weighted F1-score (0.887 to 0.915) and success rate (84.7% to 89.6%), along with higher explanation confidence (0.86), improved fidelity (0.79), and more compact explanations (0.31). These results demonstrate enhanced effectiveness and trustworthiness of autonomous cyber defense.
翻译:高级持续性威胁(APT)是隐蔽的多阶段攻击,需要自适应且及时的防御。尽管深度强化学习实现了自主网络防御,但其决策在运行环境中往往不透明且难以信任。本文提出DeepXplain——一种面向阶段感知APT防御的可解释深度强化学习框架。该框架基于我们先前的DeepStage模型,集成基于溯源图谱的学习、时间阶段估计以及统一的XAI流水线,提供结构、时间与策略层面的解释。与事后方法不同,DeepXplain通过证据对齐和置信度感知的奖励塑造,直接将解释信号融入策略优化。据我们所知,DeepXplain是首个将解释信号集成到APT防御强化学习中的框架。在真实企业测试床上的实验显示:阶段加权F1分数从0.887提升至0.915,成功率从84.7%提升至89.6%,同时解释置信度达0.86,保真度提高至0.79,解释紧凑性指标为0.31。这些结果证明了自主网络防御效能的增强与可信任度的提升。