The detection of malicious websites has become a critical issue in cybersecurity. Therefore, this paper offers a comprehensive review of data-driven methods for detecting malicious websites. Traditional approaches and their limitations are discussed, followed by an overview of data-driven approaches. The paper establishes the data-feature-model-extension pipeline and the latest research developments of data-driven approaches, including data preprocessing, feature extraction, model construction and technology extension. Specifically, this paper compares methods using deep learning models proposed in recent years. Furthermore, the paper follows the data-feature-model-extension pipeline to discuss the challenges together with some future directions of data-driven methods in malicious website detection.
翻译:恶意网站检测已成为网络安全中的关键问题。为此,本文对数据驱动方法在恶意网站检测中的应用进行了全面综述。首先讨论了传统方法及其局限性,随后概述了数据驱动方法。本文构建了数据-特征-模型-扩展这一技术路线,并梳理了数据驱动方法的最新研究进展,涵盖数据预处理、特征提取、模型构建与技术扩展等环节。具体而言,本文对近年来提出的基于深度学习模型的方法进行了比较。此外,本文遵循数据-特征-模型-扩展的技术路线,探讨了数据驱动方法在恶意网站检测中面临的挑战及未来发展方向。