Malicious Cyber Activity Detection Using Zigzag Persistence

Audun Myers,Alyson Bittner,Sinan Aksoy,Daniel M. Best,Gregory Henselman-Petrusek,Helen Jenne,Cliff Joslyn,Bill Kay,Garret Seppala,Stephen J. Young,Emilie Purvine

In this study we synthesize zigzag persistence from topological data analysis with autoencoder-based approaches to detect malicious cyber activity and derive analytic insights. Cybersecurity aims to safeguard computers, networks, and servers from various forms of malicious attacks, including network damage, data theft, and activity monitoring. Here we focus on the detection of malicious activity using log data. To do this we consider the dynamics of the data by exploring the changing topology of a hypergraph representation gaining insights into the underlying activity. Hypergraphs provide a natural representation of cyber log data by capturing complex interactions between processes. To study the changing topology we use zigzag persistence which captures how topological features persist at multiple dimensions over time. We observe that the resulting barcodes represent malicious activity differently than benign activity. To automate this detection we implement an autoencoder trained on a vectorization of the resulting zigzag persistence barcodes. Our experimental results demonstrate the effectiveness of the autoencoder in detecting malicious activity in comparison to standard summary statistics. Overall, this study highlights the potential of zigzag persistence and its combination with temporal hypergraphs for analyzing cybersecurity log data and detecting malicious behavior.

翻译：本研究融合拓扑数据分析中的锯齿形持久性与自编码器方法，用于检测恶意网络活动并推导分析洞见。网络安全旨在保护计算机、网络和服务器免受各类恶意攻击，包括网络破坏、数据窃取和活动监控。本文聚焦于利用日志数据检测恶意活动。为此，我们通过探索超图表示的拓扑动态变化来考量数据的动态特性，从而获取对潜在活动的深入理解。超图通过捕捉进程间的复杂交互，为网络日志数据提供了自然表示形式。为研究拓扑变化，我们采用锯齿形持久性方法，该方法可捕获拓扑特征随时间在多维度上的持续特性。观察发现，生成的条形码对恶意活动与良性活动呈现出差异化表征。为实现自动化检测，我们训练了一个基于锯齿形持久性条形码向量化的自编码器。实验结果表明，与标准统计摘要相比，该自编码器在检测恶意活动方面具有显著有效性。总体而言，本研究凸显了锯齿形持久性及其与时序超图结合用于分析网络安全日志数据、检测恶意行为的潜力。

相关内容

自编码器

关注 141

自动编码器是一种人工神经网络，用于以无监督的方式学习有效的数据编码。自动编码器的目的是通过训练网络忽略信号“噪声”来学习一组数据的表示（编码），通常用于降维。与简化方面一起，学习了重构方面，在此，自动编码器尝试从简化编码中生成尽可能接近其原始输入的表示形式，从而得到其名称。基本模型存在几种变体，其目的是迫使学习的输入表示形式具有有用的属性。自动编码器可有效地解决许多应用问题，从面部识别到获取单词的语义。

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日