Phishing has been a prevalent cyber threat that manipulates users into revealing sensitive private information through deceptive tactics, designed to masquerade as trustworthy entities. Over the years, proactively detection of phishing URLs (or websites) has been established as an widely-accepted defense approach. In literature, we often find supervised Machine Learning (ML) models with highly competitive performance for detecting phishing websites based on the extracted features from both phishing and benign (i.e., legitimate) websites. However, it is still unclear if these features or indicators are dependent on a particular dataset or they are generalized for overall phishing detection. In this paper, we delve deeper into this issue by analyzing two publicly available phishing URL datasets, where each dataset has its own set of unique and overlapping features related to URL string and website contents. We want to investigate if overlapping features are similar in nature across datasets and how does the model perform when trained on one dataset and tested on the other. We conduct practical experiments and leverage explainable AI (XAI) methods such as SHAP plots to provide insights into different features' contributions in case of phishing detection to answer our primary question, ``Can features for phishing URL detection be trusted across diverse dataset?''. Our case study experiment results show that features for phishing URL detection can often be dataset-dependent and thus may not be trusted across different datasets even though they share same set of feature behaviors.
翻译:钓鱼攻击作为一种普遍存在的网络威胁,通过伪装成可信实体的欺骗手段诱导用户泄露敏感隐私信息。多年来,主动检测钓鱼URL(或网站)已成为业界广泛认可的防御手段。现有文献中,基于从钓鱼网站和良性(即合法)网站提取的特征,监督式机器学习模型在钓鱼网站检测方面常表现出极具竞争力的性能。然而,这些特征或指标是否依赖于特定数据集,抑或具有普适性的钓鱼检测能力,目前仍不明确。本文通过分析两个公开的钓鱼URL数据集深入探讨该问题,每个数据集均包含与URL字符串及网站内容相关的独特特征集与重叠特征集。我们旨在探究重叠特征在不同数据集间是否具有本质相似性,以及模型在跨数据集训练测试时的性能表现。通过实际实验并借助SHAP图等可解释人工智能方法,我们揭示了不同特征在钓鱼检测中的贡献度,从而回答核心研究问题:“钓鱼URL检测特征在不同数据集间是否可信?”案例研究实验结果表明,钓鱼URL检测特征常具有数据集依赖性,即使特征行为表现相似,在不同数据集间仍可能不可信。