TREC: APT Tactic / Technique Recognition via Few-Shot Provenance Subgraph Learning

APT (Advanced Persistent Threat) with the characteristics of persistence, stealth, and diversity is one of the greatest threats against cyber-infrastructure. As a countermeasure, existing studies leverage provenance graphs to capture the complex relations between system entities in a host for effective APT detection. In addition to detecting single attack events as most existing work does, understanding the tactics / techniques (e.g., Kill-Chain, ATT&CK) applied to organize and accomplish the APT attack campaign is more important for security operations. Existing studies try to manually design a set of rules to map low-level system events to high-level APT tactics / techniques. However, the rule based methods are coarse-grained and lack generalization ability, thus they can only recognize APT tactics and cannot identify fine-grained APT techniques and mutant APT attacks. In this paper, we propose TREC, the first attempt to recognize APT tactics / techniques from provenance graphs by exploiting deep learning techniques. To address the "needle in a haystack" problem, TREC segments small and compact subgraphs covering individual APT technique instances from a large provenance graph based on a malicious node detection model and a subgraph sampling algorithm. To address the "training sample scarcity" problem, TREC trains the APT tactic / technique recognition model in a few-shot learning manner by adopting a Siamese neural network. We evaluate TREC based on a customized dataset collected and made public by our team. The experiment results show that TREC significantly outperforms state-of-the-art systems in APT tactic recognition and TREC can also effectively identify APT techniques.

翻译：高级持续性威胁（APT）具有持久性、隐蔽性和多样性等特点，是网络基础设施面临的最严重威胁之一。作为应对措施，现有研究利用溯源图捕获主机内系统实体间的复杂关系，以实现有效的APT检测。与多数现有工作仅检测单一攻击事件不同，理解用于组织和实施APT攻击活动的战术/技术（如Kill-Chain、ATT&CK）对安全运营更为重要。现有研究尝试通过人工设计规则集将低层系统事件映射到高层APT战术/技术，但此类基于规则的方法粒度粗糙且泛化能力不足，仅能识别APT战术而无法检测细粒度APT技术及其变种攻击。本文提出TREC，首次尝试通过深度学习技术从溯源图中识别APT战术/技术。针对"大海捞针"问题，TREC基于恶意节点检测模型与子图采样算法，从大型溯源图中分割覆盖独立APT技术实例的紧凑子图。针对"训练样本稀缺"问题，TREC采用孪生神经网络以少样本学习方式训练APT战术/技术识别模型。基于团队收集并公开的自定义数据集评估表明，TREC在APT战术识别上显著优于现有最优系统，并能有效识别APT技术。