Phishing attacks are a growing cybersecurity threat, leveraging deceptive techniques to steal sensitive information through malicious websites. To combat these attacks, this paper introduces PhishGuard, an optimal custom ensemble model designed to improve phishing site detection. The model combines multiple machine learning classifiers, including Random Forest, Gradient Boosting, CatBoost, and XGBoost, to enhance detection accuracy. Through advanced feature selection methods such as SelectKBest and RFECV, and optimizations like hyperparameter tuning and data balancing, the model was trained and evaluated on four publicly available datasets. PhishGuard outperformed state-of-the-art models, achieving a detection accuracy of 99.05% on one of the datasets, with similarly high results across other datasets. This research demonstrates that optimization methods in conjunction with ensemble learning greatly improve phishing detection performance.
翻译:钓鱼攻击是一种日益严重的网络安全威胁,其利用欺骗性技术通过恶意网站窃取敏感信息。为应对此类攻击,本文提出PhishGuard——一种旨在提升钓鱼网站检测性能的最优定制集成模型。该模型融合了多种机器学习分类器,包括随机森林、梯度提升、CatBoost和XGBoost,以增强检测准确率。通过SelectKBest和RFECV等先进特征选择方法,结合超参数调优与数据平衡等优化技术,模型在四个公开数据集上进行了训练与评估。PhishGuard在多个数据集上均优于现有最优模型,在其中一个数据集上实现了99.05%的检测准确率,其他数据集亦获得相近的高性能结果。本研究表明,优化方法与集成学习相结合能显著提升钓鱼检测性能。