This paper presents the FlowTransformer framework, a novel approach for implementing transformer-based Network Intrusion Detection Systems (NIDSs). FlowTransformer leverages the strengths of transformer models in identifying the long-term behaviour and characteristics of networks, which are often overlooked by most existing NIDSs. By capturing these complex patterns in network traffic, FlowTransformer offers a flexible and efficient tool for researchers and practitioners in the cybersecurity community who are seeking to implement NIDSs using transformer-based models. FlowTransformer allows the direct substitution of various transformer components, including the input encoding, transformer, classification head, and the evaluation of these across any flow-based network dataset. To demonstrate the effectiveness and efficiency of the FlowTransformer framework, we utilise it to provide an extensive evaluation of various common transformer architectures, such as GPT 2.0 and BERT, on three commonly used public NIDS benchmark datasets. We provide results for accuracy, model size and speed. A key finding of our evaluation is that the choice of classification head has the most significant impact on the model performance. Surprisingly, Global Average Pooling, which is commonly used in text classification, performs very poorly in the context of NIDS. In addition, we show that model size can be reduced by over 50\%, and inference and training times improved, with no loss of accuracy, by making specific choices of input encoding and classification head instead of other commonly used alternatives.
翻译:本文提出FlowTransformer框架,这是一种用于实现基于Transformer的网络入侵检测系统(NIDS)的创新方法。FlowTransformer充分发挥了Transformer模型在识别网络长期行为与特性方面的优势,而这些特征往往被现有大多数NIDS所忽视。通过捕获网络流量中的复杂模式,FlowTransformer为网络安全领域的研究人员和实践者提供了一种灵活高效的工具,使其能够基于Transformer模型实现NIDS。该框架支持直接替换包括输入编码、Transformer模块、分类头在内的各组件,并在任意基于流特征的网络数据集上对其进行评估。为验证FlowTransformer框架的有效性与效率,我们利用其在三个常用公开NIDS基准数据集上对GPT 2.0和BERT等多种常见Transformer架构进行了全面评估,并提供了准确率、模型大小和运行速度等指标。评估的关键发现是分类头的选择对模型性能影响最为显著。令人意外的是,文本分类中常用的全局平均池化方法在NIDS场景下表现极差。此外,研究表明,通过选择特定的输入编码与分类头而非其他常用方案,可在不损失准确率的前提下将模型大小缩减50%以上,并显著提升推理与训练速度。