ProHunter: A Comprehensive APT Hunting System Based on Whole-System Provenance

Advanced Persistent Threats (APTs) remain difficult to detect due to their stealthy nature and long-term persistence. To tackle this challenge, provenance-based threat hunting has gained traction as a proactive defense mechanism. This technique models audit logs as a whole-system provenance graph and searches for subgraphs that match APT patterns recorded in Cyber Threat Intelligence (CTI) reports. However, several limitations persist: 1) significant memory and time overhead due to the extremely large provenance graphs; 2) imprecise segmentation of APT activities from provenance graphs due to their intricate entanglement with benign operations; and 3) poor alignment of attack representations between CTI-derived query graphs and provenance graphs due to their substantial semantic gaps. To address these limitations, this paper presents ProHunter, an efficient and accurate provenance-based APT hunting system with a platform-independent design. To minimize system overhead, ProHunter creates a compact data structure that efficiently stores long-term provenance graphs using semantic abstraction and bit-level hierarchical encoding strategies. To segment APT behaviors, a heuristic-driven threat graph sampling algorithm is designed, which can extract precise attack patterns from provenance graphs. Furthermore, to bridge the semantic gaps between CTI-derived graphs and provenance graphs, ProHunter proposes adaptive graph representation and feature enhancement methods, enabling the extraction of consistent attack semantics at both localized and globalized levels.Extensive evaluations on real-world APT campaigns from DARPA TC E3, E5 and OpTC datasets demonstrate that ProHunter outperforms state-of-the-art threat hunting systems in terms of efficiency and accuracy. Our code is available at https://github.com/xueboQiu/ProHunter.

翻译：高级持续性威胁（APT）因其隐蔽性和长期持续性而难以检测。为应对这一挑战，基于溯源的威胁追猎作为一种主动防御机制已获得广泛关注。该技术将审计日志建模为全系统溯源图，并搜索与网络威胁情报（CTI）报告中记录的APT模式相匹配的子图。然而，现有方法仍存在若干局限性：1）由于溯源图规模庞大，导致显著的内存与时间开销；2）APT活动与良性操作错综复杂地交织，导致从溯源图中难以精确分割APT行为；3）CTI衍生的查询图与溯源图之间存在显著的语义鸿沟，导致攻击表征的对齐效果不佳。为解决这些问题，本文提出ProHunter——一种高效、准确且具有平台无关设计的基于溯源的APT追猎系统。为降低系统开销，ProHunter通过语义抽象和比特级层次编码策略，构建了一种紧凑型数据结构，可高效存储长期溯源图。为分割APT行为，设计了一种启发式驱动的威胁图采样算法，能够从溯源图中提取精确的攻击模式。此外，为弥合CTI衍生图与溯源图之间的语义鸿沟，ProHunter提出了自适应图表示与特征增强方法，能够在局部与全局层面提取一致的攻击语义。基于DARPA TC E3、E5及OpTC数据集中的真实APT攻击活动的广泛评估表明，ProHunter在效率与准确性方面均优于现有最先进的威胁追猎系统。我们的代码已开源在https://github.com/xueboQiu/ProHunter。