With the rise of sophisticated scam websites that exploit human psychological vulnerabilities, distinguishing between legitimate and scam websites has become increasingly challenging. This paper presents ScamFerret, an innovative agent system employing a large language model (LLM) to autonomously collect and analyze data from a given URL to determine whether it is a scam. Unlike traditional machine learning models that require large datasets and feature engineering, ScamFerret leverages LLMs' natural language understanding to accurately identify scam websites of various types and languages without requiring additional training or fine-tuning. Our evaluation demonstrated that ScamFerret achieves 0.972 accuracy in classifying four scam types in English and 0.993 accuracy in classifying online shopping websites across three different languages, particularly when using GPT-4. Furthermore, we confirmed that ScamFerret collects and analyzes external information such as web content, DNS records, and user reviews as necessary, providing a basis for identifying scam websites from multiple perspectives. These results suggest that LLMs have significant potential in enhancing cybersecurity measures against sophisticated scam websites.
翻译:随着利用人类心理弱点的复杂诈骗网站日益增多,区分合法网站与诈骗网站变得愈发困难。本文提出ScamFerret——一种创新性的智能体系统,该系统利用大语言模型(LLM)自主收集并分析给定URL的数据,以判定其是否为诈骗网站。与传统机器学习模型需要大规模数据集和特征工程不同,ScamFerret借助LLM的自然语言理解能力,无需额外训练或微调即可准确识别多种类型和语言的诈骗网站。我们的评估表明,ScamFerret在英语四类诈骗网站分类中达到0.972准确率,在三语种网购网站分类中达到0.993准确率(尤其在采用GPT-4时)。此外,我们证实ScamFerret能按需收集分析网页内容、DNS记录及用户评价等外部信息,为多维度识别诈骗网站提供依据。这些结果表明,大语言模型在增强针对复杂诈骗网站的网络安全防护措施方面具有显著潜力。