ModSec-AdvLearn: Countering Adversarial SQL Injections with Robust Machine Learning

Many Web Application Firewalls (WAFs) leverage the OWASP Core Rule Set (CRS) to block incoming malicious requests. The CRS consists of different sets of rules designed by domain experts to detect well-known web attack patterns. Both the set of rules to be used and the weights used to combine them are manually defined, yielding four different default configurations of the CRS. In this work, we focus on the detection of SQL injection (SQLi) attacks, and show that the manual configurations of the CRS typically yield a suboptimal trade-off between detection and false alarm rates. Furthermore, we show that these configurations are not robust to adversarial SQLi attacks, i.e., carefully-crafted attacks that iteratively refine the malicious SQLi payload by querying the target WAF to bypass detection. To overcome these limitations, we propose (i) using machine learning to automate the selection of the set of rules to be combined along with their weights, i.e., customizing the CRS configuration based on the monitored web services; and (ii) leveraging adversarial training to significantly improve its robustness to adversarial SQLi manipulations. Our experiments, conducted using the well-known open-source ModSecurity WAF equipped with the CRS rules, show that our approach, named ModSec-AdvLearn, can (i) increase the detection rate up to 30%, while retaining negligible false alarm rates and discarding up to 50% of the CRS rules; and (ii) improve robustness against adversarial SQLi attacks up to 85%, marking a significant stride toward designing more effective and robust WAFs. We release our open-source code at https://github.com/pralab/modsec-advlearn.

翻译：许多Web应用防火墙（WAF）利用OWASP核心规则集（CRS）来拦截传入的恶意请求。CRS由领域专家设计的多组规则构成，用于检测已知的Web攻击模式。所使用的规则集及其组合权重均为手动定义，从而形成了四种不同的CRS默认配置。本研究聚焦于SQL注入（SQLi）攻击的检测，并证明CRS的手动配置通常在检测率与误报率之间产生次优权衡。此外，我们发现这些配置对对抗性SQLi攻击缺乏鲁棒性——此类攻击通过向目标WAF发起查询以迭代优化恶意SQLi载荷，从而绕过检测。为克服这些限制，我们提出：（i）利用机器学习自动选择待组合的规则集及其权重，即根据监控的Web服务定制CRS配置；（ii）采用对抗训练显著提升其对对抗性SQLi操作的鲁棒性。我们在搭载CRS规则的知名开源ModSecurity WAF上进行的实验表明，所提出的ModSec-AdvLearn方法能够：（i）在保持可忽略的误报率并舍弃高达50% CRS规则的同时，将检测率提升至30%；（ii）将对抗SQLi攻击的鲁棒性提升至85%，标志着向设计更高效、更鲁棒的WAF迈出重要一步。我们的开源代码发布于https://github.com/pralab/modsec-advlearn。