Brewing Vodka: Distilling Pure Knowledge for Lightweight Threat Detection in Audit Logs

Advanced Persistent Threats (APTs) are continuously evolving, leveraging their stealthiness and persistence to put increasing pressure on current provenance-based Intrusion Detection Systems (IDS). This evolution exposes several critical issues: (1) The dense interaction between malicious and benign nodes within provenance graphs introduces neighbor noise, hindering effective detection; (2) The complex prediction mechanisms of existing APTs detection models lead to the insufficient utilization of prior knowledge embedded in the data; (3) The high computational cost makes detection impractical. To address these challenges, we propose Vodka, a lightweight threat detection system built on a knowledge distillation framework, capable of node-level detection within audit log provenance graphs. Specifically, Vodka applies graph Laplacian regularization to reduce neighbor noise, obtaining smoothed and denoised graph signals. Subsequently, Vodka employs a teacher model based on GNNs to extract knowledge, which is then distilled into a lightweight student model. The student model is designed as a trainable combination of a feature transformation module and a personalized PageRank random walk label propagation module, with the former capturing feature knowledge and the latter learning label and structural knowledge. After distillation, the student model benefits from the knowledge of the teacher model to perform precise threat detection. Finally, Vodka reconstructs attack paths from anomalous nodes, providing insight into the attackers' strategies. We evaluate Vodka through extensive experiments on three public datasets and compare its performance against several state-of-the-art IDS solutions. The results demonstrate that Vodka achieves outstanding detection accuracy across all scenarios and the detection time is 1.4 to 5.2 times faster than the current state-of-the-art methods.

翻译：高级持续性威胁（APT）持续演进，利用其隐蔽性和持久性给当前基于溯源图的入侵检测系统（IDS）带来日益增长的压力。这一演进暴露出若干关键问题：（1）溯源图中恶意节点与良性节点间的密集交互引入了邻居噪声，阻碍了有效检测；（2）现有APT检测模型复杂的预测机制导致数据中蕴含的先验知识利用不足；（3）高昂的计算成本使得检测难以实际应用。为应对这些挑战，我们提出了Vodka，一个基于知识蒸馏框架构建的轻量级威胁检测系统，能够在审计日志溯源图中实现节点级检测。具体而言，Vodka应用图拉普拉斯正则化来减少邻居噪声，获得平滑且去噪后的图信号。随后，Vodka采用基于图神经网络（GNN）的教师模型来提取知识，并将其蒸馏到一个轻量级的学生模型中。该学生模型被设计为一个可训练的特征转换模块与个性化PageRank随机游走标签传播模块的组合，前者用于捕获特征知识，后者则学习标签和结构知识。经过蒸馏后，学生模型受益于教师模型的知识，能够执行精确的威胁检测。最后，Vodka从异常节点重建攻击路径，从而洞察攻击者的策略。我们在三个公开数据集上通过大量实验评估了Vodka，并将其性能与多种最先进的IDS解决方案进行了比较。结果表明，Vodka在所有场景下均实现了卓越的检测准确率，且检测时间比当前最先进方法快1.4至5.2倍。