Enhanced Web Payload Classification Using WAMM: An AI-Based Framework for Dataset Refinement and Model Evaluation

Web applications increasingly face evasive and polymorphic attack payloads, yet traditional web application firewalls (WAFs) based on static rule sets such as the OWASP Core Rule Set (CRS) often miss obfuscated or zero-day patterns without extensive manual tuning. This work introduces WAMM, an AI-driven multiclass web attack detection framework designed to reveal the limitations of rule-based systems by reclassifying HTTP requests into OWASP-aligned categories for a specific technology stack. WAMM applies a multi-phase enhancement pipeline to the SR-BH 2020 dataset that includes large-scale deduplication, LLM-guided relabeling, realistic attack data augmentation, and LLM-based filtering, producing three refined datasets. Four machine and deep learning models are evaluated using a unified feature space built from statistical and text-based representations. Results show that using an augmented and LLM-filtered dataset on the same technology stack, XGBoost reaches 99.59% accuracy with microsecond-level inference while deep learning models degrade under noisy augmentation. When tested against OWASP CRS using an unseen augmented dataset, WAMM achieves true positive block rates between 96 and 100% with improvements of up to 86%. These findings expose gaps in widely deployed rule-based defenses and demonstrate that curated training pipelines combined with efficient machine learning models enable a more resilient, real-time approach to web attack detection suitable for production WAF environments.

翻译：网络应用日益面临规避性和多态性攻击载荷的威胁，然而基于静态规则集（如OWASP核心规则集CRS）的传统网络应用防火墙（WAF）往往在缺乏大量人工调优的情况下，无法识别混淆或零日攻击模式。本研究提出WAMM，一种AI驱动的多类别网络攻击检测框架，旨在通过将HTTP请求重新分类为符合OWASP标准且针对特定技术栈的类别，揭示基于规则的系统的局限性。WAMM对SR-BH 2020数据集实施多阶段增强流程，包括大规模去重、LLM引导的重新标注、真实攻击数据增强以及基于LLM的过滤，从而生成三个精炼数据集。研究使用基于统计和文本表征构建的统一特征空间，评估了四种机器学习和深度学习模型。结果表明，在同一技术栈上使用经过增强和LLM过滤的数据集时，XGBoost模型达到了99.59%的准确率且推理时间在微秒级别，而深度学习模型在噪声增强下性能下降。当使用未见过的增强数据集对OWASP CRS进行测试时，WAMM实现了96%至100%的真实阳性拦截率，提升幅度最高达86%。这些发现揭示了广泛部署的基于规则的防御体系存在的不足，并证明经过精心设计的训练流程与高效的机器学习模型相结合，能够为生产环境WAF提供一种更具弹性、实时的网络攻击检测方法。