A Flow is a Stream of Packets: A Stream-Structured Data Approach for DDoS Detection

Distributed Denial of Service (DDoS) attacks are getting increasingly harmful to the Internet, showing no signs of slowing down. Developing an accurate detection mechanism to thwart DDoS attacks is still a big challenge due to the rich variety of these attacks and the emergence of new attack vectors. In this paper, we propose a new tree-based DDoS detection approach that operates on a flow as a stream structure, rather than the traditional fixed-size record structure containing aggregated flow statistics. Although aggregated flow records have gained popularity over the past decade, providing an effective means for flow-based intrusion detection by inspecting only a fraction of the total traffic volume, they are inherently constrained. Their detection precision is limited not only by the lack of packet payloads, but also by their structure, which is unable to model fine-grained inter-packet relations, such as packet order and temporal relations. Additionally, inferring aggregated flow statistics must wait for the complete flow to end. Here we show that considering flow inputs as variable-length streams composed of their associated packet headers, allows for very accurate and fast detection of malicious flows. We evaluate our proposed strategy on the CICDDoS2019 and CICIDS2017 datasets, which contain a comprehensive variety of DDoS attacks. Our approach matches or exceeds existing machine learning techniques' accuracy, including state-of-the-art deep learning methods. Furthermore, our method achieves significantly earlier detection, e.g., with CICDDoS2019 detection based on the first 2 packets, which corresponds to an average time-saving of 99.79% and uses only 4--6% of the traffic volume.

翻译：分布式拒绝服务（DDoS）攻击对互联网的危害日益加剧，且未见减缓趋势。由于攻击类型繁多且新型攻击向量不断涌现，开发精准的DDoS攻击检测机制仍面临重大挑战。本文提出一种基于树结构的DDoS检测新方法，该方法以流作为流式结构进行处理，而非传统包含聚合流统计信息的固定大小记录结构。尽管聚合流记录在过去十年间广受欢迎，通过仅检查总流量的一小部分即可实现有效的基于流的入侵检测，但其本身存在固有局限性。其检测精度不仅受限于缺乏数据包有效载荷，还受限于其结构无法建模细粒度的包间关系（如包顺序及时序关系）。此外，推导聚合流统计信息必须等待完整流结束。本研究表明，将流输入视为由关联包头组成的变长流，可实现恶意流的高精度快速检测。我们在包含多种DDoS攻击场景的CICDDoS2019和CICIDS2017数据集上评估了所提策略。该方法匹配或超越了现有机器学习技术（包括最先进的深度学习方法）的准确率。更重要的是，本方法实现了显著更早的检测——例如基于前2个数据包即可完成CICDDoS2019检测，相当于平均节省99.79%的时间，且仅需使用4%-6%的流量数据。