NoSQL Injection attacks are a class of cybersecurity attacks where an attacker sends a specifically engineered query to a NoSQL database which then performs an unauthorized operation. To defend against such attacks, rule based systems were initially developed but then were found to be ineffective to innovative injection attacks hence a model based approach was developed. Most model based detection systems, during testing gave exponentially positive results but were trained only on the query statement sent to the server. However due to the scarcity of data and class imbalances these model based systems were found to be not effective against all attacks in the real world. This paper explores classifying NoSQL injection attacks sent to a MongoDB server based on Log Data, and other extracted features excluding raw query statements. The log data was collected from a simulated attack on an empty MongoDB server which was then processed and explored. A discriminant analysis was carried out to determine statistically significant features to discriminate between injection and benign queries resulting in a dataset of significant features. Several Machine learning based classification models using an AutoML library, "FLAML", as well as 6 manually programmed models were trained on this dataset , which were then trained on 50 randomized samples of data, cross validated and evaluated. The study found that the best model was the "FLAML" library's "XGBoost limited depth" model with an accuracy of 71%.
翻译:NoSQL注入攻击是一类网络安全攻击,攻击者向NoSQL数据库发送经过特殊构造的查询,从而执行未授权操作。为防御此类攻击,最初开发了基于规则的系统,但随后发现其对新型注入攻击效果有限,因此发展了基于模型的检测方法。多数基于模型的检测系统在测试阶段表现出指数级正向结果,但其训练仅基于发送至服务器的查询语句。然而,由于数据稀缺和类别不平衡问题,这些模型在实际环境中未能有效应对所有攻击。本文研究基于MongoDB服务器日志数据及其他提取特征(排除原始查询语句)对NoSQL注入攻击进行分类的方法。通过模拟攻击空载MongoDB服务器收集日志数据,并进行处理与分析。通过判别分析确定具有统计显著性的特征以区分注入查询与良性查询,最终构建出显著特征数据集。研究采用自动化机器学习库"FLAML"构建的多种机器学习分类模型,以及6个手动编程模型,在50组随机数据样本上进行训练、交叉验证与评估。实验结果表明,"FLAML"库中的"深度受限XGBoost"模型以71%的准确率成为最优分类模型。