Recently, advances in deep learning have been observed in various fields, including computer vision, natural language processing, and cybersecurity. Machine learning (ML) has demonstrated its ability as a potential tool for anomaly detection-based intrusion detection systems to build secure computer networks. Increasingly, ML approaches are widely adopted than heuristic approaches for cybersecurity because they learn directly from data. Data is critical for the development of ML systems, and becomes potential targets for attackers. Basically, data poisoning or contamination is one of the most common techniques used to fool ML models through data. This paper evaluates the robustness of six recent deep learning algorithms for intrusion detection on contaminated data. Our experiments suggest that the state-of-the-art algorithms used in this study are sensitive to data contamination and reveal the importance of self-defense against data perturbation when developing novel models, especially for intrusion detection systems.
翻译:近年来,深度学习在计算机视觉、自然语言处理和网络安全等多个领域取得了显著进展。机器学习(ML)已展现出作为基于异常检测的入侵检测系统构建安全计算机网络的潜在工具的能力。由于能够直接从数据中学习,机器学习方法在网络安全领域的应用日益广泛,逐渐取代启发式方法。数据是机器学习系统开发的关键,同时也成为攻击者潜在的目标。数据投毒或污染本质上是通过数据欺骗机器学习模型的最常用技术之一。本文评估了六种最新深度学习算法在受污染数据上用于入侵检测的鲁棒性。实验结果表明,本研究中使用的最新算法对数据污染较为敏感,并揭示了在开发新型模型(尤其是入侵检测系统)时,针对数据扰动进行自我防御的重要性。