SPARSE: Semantic Tracking and Path Analysis for Attack Investigation in Real-time

As the complexity and destructiveness of Advanced Persistent Threat (APT) increase, there is a growing tendency to identify a series of actions undertaken to achieve the attacker's target, called attack investigation. Currently, analysts construct the provenance graph to perform causality analysis on Point-Of-Interest (POI) event for capturing critical events (related to the attack). However, due to the vast size of the provenance graph and the rarity of critical events, existing attack investigation methods suffer from problems of high false positives, high overhead, and high latency. To this end, we propose SPARSE, an efficient and real-time system for constructing critical component graphs (i.e., consisting of critical events) from streaming logs. Our key observation is 1) Critical events exist in a suspicious semantic graph (SSG) composed of interaction flows between suspicious entities, and 2) Information flows that accomplish attacker's goal exist in the form of paths. Therefore, SPARSE uses a two-stage framework to implement attack investigation (i.e., constructing the SSG and performing path-level contextual analysis). First, SPARSE operates in a state-based mode where events are consumed as streams, allowing easy access to the SSG related to the POI event through semantic transfer rule and storage strategy. Then, SPARSE identifies all suspicious flow paths (SFPs) related to the POI event from the SSG, quantifies the influence of each path to filter irrelevant events. Our evaluation on a real large-scale attack dataset shows that SPARSE can generate a critical component graph (~ 113 edges) in 1.6 seconds, which is 2014 X smaller than the backtracking graph (~ 227,589 edges). SPARSE is 25 X more effective than other state-of-the-art techniques in filtering irrelevant edges.

翻译：随着高级持续性威胁（APT）的复杂性与破坏性日益加剧，识别攻击者为实现目标所执行的一系列动作（即攻击溯源）已成为关键需求。当前，研究者通过构建溯源图对兴趣点（POI）事件进行因果分析，以捕获与攻击相关的关键事件。然而，由于溯源图规模庞大且关键事件极为稀少，现有攻击溯源方法存在误报率高、开销大、延迟高等问题。为此，我们提出SPARSE——一种高效实时的系统，能够从流式日志中构建关键组件图（即由关键事件构成的图）。其核心发现如下：1）关键事件存在于由可疑实体间交互流构成的可疑语义图（SSG）中；2）实现攻击者目标的信息流以路径形式存在。基于此，SPARSE采用两阶段框架实现攻击溯源（即构建SSG并执行路径级上下文分析）。首先，SPARSE运行于基于状态的模式中，事件以流式方式被消费，通过语义传递规则与存储策略可快速访问与POI事件相关的SSG；其次，SPARSE从SSG中识别所有与POI事件相关的可疑流路径（SFP），并通过量化每条路径的影响力来过滤无关事件。在真实大规模攻击数据集上的评估表明：SPARSE能在1.6秒内生成关键组件图（约113条边），其规模仅为回溯图（约227,589条边）的1/2014；在过滤无关边方面，SPARSE的有效性较其他最先进技术提升25倍。