Advanced persistent threat (APT) attacks remain difficult to detect due to their stealth, adaptability, and use of legitimate system components. Provenance-based intrusion detection systems (PIDS) offer a promising defense by capturing detailed relationships between system components and actions. However, current PIDS rely on predefined or subset-determined thresholds, which limit detection stability and the ability to detect any anomalous behavior in general. Furthermore, related work often neglects the role of process executables, which describe system activity by interacting through a process with files, network components, and other processes. We introduce GRASP, a PIDS based on masked self-supervised classification. GRASP masks the executable information of processes and learns to infer it from their two-hop provenance graph neighborhood, marking misclassified processes as anomalies. It captures behavior patterns for the learned executables without thresholding, making it robust against interference and unknown activities. Evaluations on the DARPA TC and OpTC datasets demonstrate that GRASP consistently detects anomalous behavior, including known attack-related activities, outperforming existing systems. Our PIDS identifies all documented attacks on datasets where the behavior of executables is learnable. In addition, compared to existing systems, GRASP uncovers potentially malicious anomalous behavior not labeled as an attack in the documentation.
翻译:高级持续性威胁(APT)攻击因其隐蔽性、适应性及利用合法系统组件的特点,至今仍难以检测。基于溯源信息的入侵检测系统(PIDS)通过捕获系统组件与行为间的细粒度关系,提供了一种有前景的防御手段。然而,现有PIDS依赖预定义或子集确定的阈值,这限制了检测稳定性及对任意异常行为的泛化检测能力。此外,相关研究常忽略进程可执行文件的作用——这类文件通过与文件、网络组件及其他进程的交互描述系统活动。我们提出GRASP,一种基于掩码自监督分类的PIDS。GRASP掩码处理进程的可执行文件信息,并学习从其两跳溯源图邻域推断该信息,将分类错误的进程标记为异常。该方法无需阈值即可捕获已学习可执行文件的行为模式,对干扰和未知活动具有鲁棒性。在DARPA TC与OpTC数据集上的评估表明,GRASP能持续检测异常行为(包括已知攻击相关活动),性能优于现有系统。对于可学习可执行文件行为的攻击数据集,我们的PIDS识别了所有记录的攻击。此外,与现有系统相比,GRASP还能发现文档中未标注为攻击的潜在恶意异常行为。