Network analysis and machine learning techniques have been widely applied for building malware detection systems. Though these systems attain impressive results, they often are $(i)$ not extensible, being monolithic, well tuned for the specific task they have been designed for but very difficult to adapt and/or extend to other settings, and $(ii)$ not interpretable, being black boxes whose inner complexity makes it impossible to link the result of detection with its root cause, making further analysis of threats a challenge. In this paper we present RADAR, an extensible and explainable system that exploits the popular TTP (Tactics, Techniques, and Procedures) ontology of adversary behaviour described in the industry-standard MITRE ATT\&CK framework in order to unequivocally identify and classify malicious behaviour using network traffic. We evaluate RADAR on a very large dataset comprising of 2,286,907 malicious and benign samples, representing a total of 84,792,452 network flows. The experimental analysis confirms that the proposed methodology can be effectively exploited: RADAR's ability to detect malware is comparable to other state-of-the-art non-interpretable systems' capabilities. To the best of our knowledge, RADAR is the first TTP-based system for malware detection that uses machine learning while being extensible and explainable.
翻译:网络分析与机器学习技术已广泛应用于构建恶意软件检测系统。尽管这些系统取得了令人瞩目的成果,但它们往往存在以下问题:(1)不可扩展——系统结构单一,针对特定任务精心优化,但难以适应或扩展至其他场景;(2)不可解释——系统如同黑箱,内部复杂性导致检测结果与根本原因之间无法关联,进一步威胁分析面临挑战。本文提出RADAR,一种可扩展且可解释的系统,利用行业标准MITRE ATT&CK框架中描述的对手行为TTP(战术、技术与程序)本体,通过网络流量明确识别和分类恶意行为。我们在一个包含2,286,907个恶意与良性样本(总计84,792,452条网络流)的超大规模数据集上对RADAR进行了评估。实验分析证实该方法具有显著的有效性:RADAR检测恶意软件的能力与当前其他非可解释的先进系统相当。据我们所知,RADAR是首个基于TTP、兼具可扩展性与可解释性,并采用机器学习的恶意软件检测系统。