Web attack detection is the first line of defense for securing web applications, designed to preemptively identify malicious activities. Deep learning-based approaches are increasingly popular for their advantages: automatically learning complex patterns and extracting semantic features from HTTP requests to achieve superior detection performance. However, existing methods are less effective in embedding irregular HTTP requests, even failing to model unordered parameters and achieve attack traceability. In this paper, we propose an effective web attack detection model, named WADBERT. It achieves high detection accuracy while enabling the precise identification of malicious parameters. To this end, we first employ Hybrid Granularity Embedding (HGE) to generate fine-grained embeddings for URL and payload parameters. Then, URLBERT and SecBERT are respectively utilized to extract their semantic features. Further, parameter-level features (extracted by SecBERT) are fused through a multi-head attention mechanism, resulting in a comprehensive payload feature. Finally, by feeding the concatenated URL and payload features into a linear classifier, a final detection result is obtained. The experimental results on CSIC2010 and SR-BH2020 datasets validate the efficacy of WADBERT, which respectively achieves F1-scores of 99.63% and 99.50%, and significantly outperforms state-of-the-art methods.
翻译:Web攻击检测是保障Web应用程序安全的第一道防线,旨在预先识别恶意活动。基于深度学习的方法因其优势而日益流行:能够自动学习复杂模式并从HTTP请求中提取语义特征,从而实现卓越的检测性能。然而,现有方法在处理不规则HTTP请求的嵌入方面效果欠佳,甚至无法对无序参数进行建模并实现攻击溯源。本文提出一种有效的Web攻击检测模型,命名为WADBERT。该模型在实现高检测精度的同时,能够精确识别恶意参数。为此,我们首先采用混合粒度嵌入(HGE)为URL和载荷参数生成细粒度嵌入。随后,分别利用URLBERT和SecBERT提取其语义特征。进一步地,通过多头注意力机制融合参数级特征(由SecBERT提取),形成综合的载荷特征。最后,将拼接后的URL特征与载荷特征输入线性分类器,获得最终检测结果。在CSIC2010和SR-BH2020数据集上的实验结果验证了WADBERT的有效性,其F1分数分别达到99.63%和99.50%,显著优于现有最先进方法。