Phishing is one of the most effective ways in which cybercriminals get sensitive details such as credentials for online banking, digital wallets, state secrets, and many more from potential victims. They do this by spamming users with malicious URLs with the sole purpose of tricking them into divulging sensitive information which is later used for various cybercrimes. In this research, we did a comprehensive review of current state-of-the-art machine learning and deep learning phishing detection techniques to expose their vulnerabilities and future research direction. For better analysis and observation, we split machine learning techniques into Bayesian, non-Bayesian, and deep learning. We reviewed the most recent advances in Bayesian and non-Bayesian-based classifiers before exploiting their corresponding weaknesses to indicate future research direction. While exploiting weaknesses in both Bayesian and non-Bayesian classifiers, we also compared each performance with a deep learning classifier. For a proper review of deep learning-based classifiers, we looked at Recurrent Neural Networks (RNN), Convolutional Neural Networks (CNN), and Long Short Term Memory Networks (LSTMs). We did an empirical analysis to evaluate the performance of each classifier along with many of the proposed state-of-the-art anti-phishing techniques to identify future research directions, we also made a series of proposals on how the performance of the under-performing algorithm can improved in addition to a two-stage prediction model
翻译:钓鱼攻击是网络犯罪分子从潜在受害者处获取敏感信息(如网上银行凭证、数字钱包密码、国家机密等)的最有效手段之一。其通过向用户发送包含恶意URL的垃圾信息,诱骗受害者泄露敏感数据,进而用于实施各类网络犯罪。本研究对当前最先进的机器学习和深度学习钓鱼检测技术进行了全面综述,以揭示其脆弱性并指明未来研究方向。为便于分析与观察,我们将机器学习技术划分为贝叶斯方法、非贝叶斯方法及深度学习三类。在剖析贝叶斯与非贝叶斯分类器最新进展的基础上,通过揭示其相应弱点来明确未来研究路径。在探究两类传统分类器弱点的同时,我们还将其性能与深度学习分类器进行了对比。针对深度学习分类器,我们重点考察了循环神经网络(RNN)、卷积神经网络(CNN)和长短期记忆网络(LSTM)。通过实证分析评估各类分类器性能及多种前沿反钓鱼技术,我们不仅明确了未来研究方向,还就性能欠佳算法的改进方案提出系列建议,并设计了一个两阶段预测模型。