In today's interconnected digital landscape, the proliferation of malware poses a significant threat to the security and stability of computer networks and systems worldwide. As the complexity of malicious tactics, techniques, and procedures (TTPs) continuously grows to evade detection, so does the need for advanced methods capable of capturing and characterizing malware behavior. The current state of the art in malware classification and detection uses task specific objectives; however, this method fails to generalize to other downstream tasks involving the same malware class. In this paper, the authors introduce a novel method that combines convolutional neural networks, standard graph embedding techniques, and a metric learning objective to extract meaningful information from network flow data and create strong embeddings characterizing malware behavior. These embeddings enable the development of highly accurate, efficient, and generalizable machine learning models for tasks such as malware strain classification, zero day threat detection, and closest attack type attribution as demonstrated in this paper. A shift from task specific objectives to strong embeddings will not only allow rapid iteration of cyber-threat detection models, but also allow different modalities to be introduced in the development of these models.
翻译:在当今互联互通的数字环境中,恶意软件的泛滥对全球计算机网络和系统的安全与稳定构成了重大威胁。随着恶意策略、技术与程序(TTPs)的复杂性持续增长以规避检测,亟需能够捕获并表征恶意软件行为的高级方法。当前恶意软件分类与检测领域的先进技术依赖于特定任务目标,但此类方法无法泛化至涉及同一恶意软件类别的其他下游任务。本文作者提出一种创新方法,融合卷积神经网络、标准图嵌入技术及度量学习目标,从网络流量数据中提取有效信息,生成表征恶意软件行为的强嵌入向量。如本文所示,这些嵌入向量能够为恶意软件变种分类、零日威胁检测及最近攻击类型归因等任务开发高精度、高效且可泛化的机器学习模型。从特定任务目标向强嵌入向量的转变,不仅能够实现网络威胁检测模型的快速迭代,还能在模型开发过程中引入不同模态的数据。