It is an interesting question Can and How Large Language Models (LLMs) understand non-language network data, and help us detect unknown malicious flows. This paper takes Carpet Bombing as a case study and shows how to exploit LLMs' powerful capability in the networking area. Carpet Bombing is a new DDoS attack that has dramatically increased in recent years, significantly threatening network infrastructures. It targets multiple victim IPs within subnets, causing congestion on access links and disrupting network services for a vast number of users. Characterized by low-rates, multi-vectors, these attacks challenge traditional DDoS defenses. We propose DoLLM, a DDoS detection model utilizes open-source LLMs as backbone. By reorganizing non-contextual network flows into Flow-Sequences and projecting them into LLMs semantic space as token embeddings, DoLLM leverages LLMs' contextual understanding to extract flow representations in overall network context. The representations are used to improve the DDoS detection performance. We evaluate DoLLM with public datasets CIC-DDoS2019 and real NetFlow trace from Top-3 countrywide ISP. The tests have proven that DoLLM possesses strong detection capabilities. Its F1 score increased by up to 33.3% in zero-shot scenarios and by at least 20.6% in real ISP traces.
翻译:一个有趣的问题是:大语言模型(LLMs)能否以及如何理解非语言类的网络数据,并帮助我们检测未知的恶意流量。本文以地毯式轰炸(Carpet Bombing)攻击为案例,展示了如何利用LLMs在网络领域的强大能力。地毯式轰炸是一种新型DDoS攻击,近年来急剧增加,对网络基础设施构成重大威胁。它针对子网内的多个受害IP地址,导致接入链路拥塞,并中断大量用户的网络服务。此类攻击具有低速率、多向量的特点,对传统DDoS防御手段提出了挑战。我们提出DoLLM,一种以开源LLMs为主干网的DDoS检测模型。通过将无上下文关联的网络流量重组为流量序列(Flow-Sequences),并将其投射到LLMs的语义空间中作为词元嵌入(token embeddings),DoLLM借助LLMs的上下文理解能力,在整体网络背景下提取流量表示。这些表示用于提升DDoS检测性能。我们使用公开数据集CIC-DDoS2019及来自全国前三名ISP的真实NetFlow记录对DoLLM进行评估。测试证明,DoLLM具备强大的检测能力。在零样本(zero-shot)场景下,其F1分数最高提升33.3%;在真实ISP记录中,F1分数提升至少20.6%。