In the context of cybersecurity, tracking the activities of coordinated hosts over time is a daunting task because both participants and their behaviours evolve at a fast pace. We address this scenario by solving a dynamic novelty discovery problem with the aim of both re-identifying patterns seen in the past and highlighting new patterns. We focus on traffic collected by Network Telescopes, a primary and noisy source for cybersecurity analysis. We propose a 3-stage pipeline: (i) we learn compact representations (embeddings) of hosts through their traffic in a self-supervised fashion; (ii) via clustering, we distinguish groups of hosts performing similar activities; (iii) we track the cluster temporal evolution to highlight novel patterns. We apply our methodology to 20 days of telescope traffic during which we observe more than 8 thousand active hosts. Our results show that we efficiently identify 50-70 well-shaped clusters per day, 60-70% of which we associate with already analysed cases, while we pinpoint 10-20 previously unseen clusters per day. These correspond to activity changes and new incidents, of which we document some. In short, our novelty discovery methodology enormously simplifies the manual analysis the security analysts have to conduct to gain insights to interpret novel coordinated activities.
翻译:在网络安全领域,追踪主机集群随时间变化的协同活动是一项艰巨任务,因为参与者及其行为都在快速演变。针对这一场景,我们通过解决动态新颖性发现问题,既能重新识别历史模式,又能突出新出现的模式。研究聚焦于网络望远镜采集的流量数据——这是一种用于网络安全分析的主要且具有噪声特性的数据源。我们提出三阶段流水线:(i) 通过自监督方式学习主机的流量紧凑表征(嵌入); (ii) 通过聚类区分执行相似活动的主机群组; (iii) 追踪聚类随时间演变的轨迹以突出新模式。我们将该方法应用于20天的望远镜流量数据,观测到超过8000个活跃主机。结果表明,我们每天可高效识别50-70个形态良好的聚类,其中60-70%可与已分析案例关联,同时每天定位10-20个全新聚类。这些新聚类对应活动变化和新安全事件,文中对部分案例进行了记录。简言之,我们的新颖性发现方法极大简化了安全分析师需要开展的手动分析工作,帮助他们获取解读新型协同活动的洞见。