In this paper, we propose HyperVision, a realtime unsupervised machine learning (ML) based malicious traffic detection system. Particularly, HyperVision is able to detect unknown patterns of encrypted malicious traffic by utilizing a compact inmemory graph built upon the traffic patterns. The graph captures flow interaction patterns represented by the graph structural features, instead of the features of specific known attacks. We develop an unsupervised graph learning method to detect abnormal interaction patterns by analyzing the connectivity, sparsity, and statistical features of the graph, which allows HyperVision to detect various encrypted attack traffic without requiring any labeled datasets of known attacks. Moreover, we establish an information theory model to demonstrate that the information preserved by the graph approaches the ideal theoretical bound. We show the performance of HyperVision by real-world experiments with 92 datasets including 48 attacks with encrypted malicious traffic. The experimental results illustrate that HyperVision achieves at least 0.92 AUC and 0.86 F1, which significantly outperform the state-of-the-art methods. In particular, more than 50% attacks in our experiments can evade all these methods. Moreover, HyperVision achieves at least 80.6 Gb/s detection throughput with the average detection latency of 0.83s.
翻译:本文提出了HyperVision,一种基于无监督机器学习的实时恶意流量检测系统。特别地,HyperVision通过利用基于流量模式构建的紧凑内存图,能够检测未知模式的加密恶意流量。该图通过图结构特征表示流交互模式,而非特定已知攻击的特征。我们开发了一种无监督图学习方法,通过分析图的连通性、稀疏性和统计特征来检测异常交互模式,这使得HyperVision无需任何已知攻击的标注数据集即可检测多种加密攻击流量。此外,我们建立了一个信息论模型,证明该图所保留的信息接近理想的理论边界。我们通过包含48种加密恶意流量攻击的92个数据集的真实实验展示了HyperVision的性能。实验结果表明,HyperVision的AUC至少达到0.92,F1值至少达到0.86,显著优于现有最先进方法。值得注意的是,实验中超过50%的攻击能够规避所有对比方法,且HyperVision实现了至少80.6 Gb/s的检测吞吐量,平均检测延迟为0.83秒。