Phishing detection is a critical cybersecurity task that involves the identification and neutralization of fraudulent attempts to obtain sensitive information, thereby safeguarding individuals and organizations from data breaches and financial loss. In this project, we address the constraints of traditional reference-based phishing detection by developing an LLM agent framework. This agent harnesses Large Language Models to actively fetch and utilize online information, thus providing a dynamic reference system for more accurate phishing detection. This innovation circumvents the need for a static knowledge base, offering a significant enhancement in adaptability and efficiency for automated security measures. The project report includes an initial study and problem analysis of existing solutions, which motivated us to develop a new framework. We demonstrate the framework with LLMs simulated as agents and detail the techniques required for construction, followed by a complete implementation with a proof-of-concept as well as experiments to evaluate our solution's performance against other similar solutions. The results show that our approach has achieved with accuracy of 0.945, significantly outperforms the existing solution(DynaPhish) by 0.445. Furthermore, we discuss the limitations of our approach and suggest improvements that could make it more effective. Overall, the proposed framework has the potential to enhance the effectiveness of current reference-based phishing detection approaches and could be adapted for real-world applications.
翻译:钓鱼检测是一项关键的网络安全任务,旨在识别并阻止试图窃取敏感信息的欺诈行为,从而保护个人和组织免受数据泄露及财务损失。本项目针对传统基于参考的钓鱼检测方法的局限性,开发了一种LLM智能体框架。该框架利用大语言模型主动获取并利用在线信息,从而构建动态参考系统,以实现更精准的钓鱼检测。这一创新避免了静态知识库的需求,显著提升了自动化安全措施的适应性与效率。项目报告包含对现有解决方案的初步研究与问题分析,这促使我们开发了新框架。我们通过模拟LLM作为智能体来演示该框架,详细阐述了构建所需的技术,随后提供了完整的概念验证实现以及实验评估,将本方案与其他类似解决方案的性能进行比较。实验结果表明,我们的方法达到了0.945的准确率,较现有解决方案(DynaPhish)显著提升了0.445。此外,我们讨论了本方法的局限性,并提出了可提升其有效性的改进方向。总体而言,所提出的框架有望提升当前基于参考的钓鱼检测方法的效能,并具备应用于实际场景的潜力。