In the context of cybersecurity, tracking the activities of coordinated hosts over time is a daunting task because both participants and their behaviours evolve at a fast pace. We address this scenario by solving a dynamic novelty discovery problem with the aim of both re-identifying patterns seen in the past and highlighting new patterns. We focus on traffic collected by Network Telescopes, a primary and noisy source for cybersecurity analysis. We propose a 3-stage pipeline: (i) we learn compact representations (embeddings) of hosts through their traffic in a self-supervised fashion; (ii) via clustering, we distinguish groups of hosts performing similar activities; (iii) we track the cluster temporal evolution to highlight novel patterns. We apply our methodology to 20 days of telescope traffic during which we observe more than 8 thousand active hosts. Our results show that we efficiently identify 50-70 well-shaped clusters per day, 60-70% of which we associate with already analysed cases, while we pinpoint 10-20 previously unseen clusters per day. These correspond to activity changes and new incidents, of which we document some. In short, our novelty discovery methodology enormously simplifies the manual analysis the security analysts have to conduct to gain insights to interpret novel coordinated activities.
翻译:在网络安全领域,追踪协调主机随时间变化的活动是一项艰巨的任务,因为参与者及其行为模式都在快速演变。我们通过解决动态新颖性发现问题来应对这一场景,旨在既重新识别过去观察到的模式,又突显新的模式。我们聚焦于网络望远镜收集的流量数据,这是网络安全分析的主要且噪声较多的来源。我们提出了一个三阶段流程:(i) 以自监督方式通过主机流量学习其紧凑表示(嵌入向量);(ii) 通过聚类区分执行相似活动的主机组;(iii) 追踪聚类的时间演化以突显新颖模式。我们将该方法应用于20天的望远镜流量数据,期间观测到超过8千台活跃主机。结果表明,我们每日能有效识别50-70个形态良好的聚类,其中60-70%可关联到已分析的案例,同时每日精确定位10-20个先前未观测到的聚类。这些对应着活动变化与新发事件,我们对其中部分案例进行了记录。简言之,我们的新颖性发现方法极大地简化了安全分析师为理解新型协调活动而需进行的人工分析工作。