Winemaking: Extracting Essential Insights for Efficient Threat Detection in Audit Logs

Advanced Persistent Threats (APTs) are continuously evolving, leveraging their stealthiness and persistence to put increasing pressure on current provenance-based Intrusion Detection Systems (IDS). This evolution exposes several critical issues: (1) The dense interaction between malicious and benign nodes within provenance graphs introduces neighbor noise, hindering effective detection; (2) The complex prediction mechanisms of existing APTs detection models lead to the insufficient utilization of prior knowledge embedded in the data; (3) The high computational cost makes detection impractical. To address these challenges, we propose Winemaking, a lightweight threat detection system built on a knowledge distillation framework, capable of node-level detection within audit log provenance graphs. Specifically, Winemaking applies graph Laplacian regularization to reduce neighbor noise, obtaining smoothed and denoised graph signals. Subsequently, Winemaking employs a teacher model based on GNNs to extract knowledge, which is then distilled into a lightweight student model. The student model is designed as a trainable combination of a feature transformation module and a personalized PageRank random walk label propagation module, with the former capturing feature knowledge and the latter learning label and structural knowledge. After distillation, the student model benefits from the knowledge of the teacher model to perform precise threat detection. We evaluate Winemaking through extensive experiments on three public datasets and compare its performance against several state-of-the-art IDS solutions. The results demonstrate that Winemaking achieves outstanding detection accuracy across all scenarios and the detection time is 1.4 to 5.2 times faster than the current state-of-the-art methods.

翻译：高级持续性威胁（APT）持续演进，利用其隐蔽性和持久性对当前基于溯源的入侵检测系统（IDS）构成日益严峻的压力。这一演进暴露出若干关键问题：（1）溯源图中恶意节点与良性节点间的密集交互引入邻域噪声，阻碍有效检测；（2）现有APT检测模型的复杂预测机制导致数据中嵌入的先验知识利用不足；（3）高昂的计算成本使得检测难以实际部署。为应对这些挑战，我们提出Winemaking——一个基于知识蒸馏框架构建的轻量级威胁检测系统，能够在审计日志溯源图中实现节点级检测。具体而言，Winemaking应用图拉普拉斯正则化以降低邻域噪声，获得平滑去噪的图信号。随后，系统采用基于图神经网络（GNN）的教师模型提取知识，并将其蒸馏至轻量级学生模型。学生模型被设计为可训练的特征转换模块与个性化PageRank随机游走标签传播模块的组合：前者捕获特征知识，后者学习标签与结构知识。经过蒸馏后，学生模型能够借助教师模型的知识实现精准威胁检测。我们在三个公开数据集上通过大量实验评估Winemaking，并将其性能与多种前沿IDS方案进行对比。结果表明，Winemaking在所有场景中均取得卓越的检测准确率，且检测速度比当前最优方法快1.4至5.2倍。