Large Language Models for Cyber Security: A Systematic Literature Review

The rapid advancement of Large Language Models (LLMs) has opened up new opportunities for leveraging artificial intelligence in various domains, including cybersecurity. As the volume and sophistication of cyber threats continue to grow, there is an increasing need for intelligent systems that can automatically detect vulnerabilities, analyze malware, and respond to attacks. In this survey, we conduct a comprehensive review of the literature on the application of LLMs in cybersecurity (LLM4Security). By comprehensively collecting over 30K relevant papers and systematically analyzing 127 papers from top security and software engineering venues, we aim to provide a holistic view of how LLMs are being used to solve diverse problems across the cybersecurity domain. Through our analysis, we identify several key findings. First, we observe that LLMs are being applied to a wide range of cybersecurity tasks, including vulnerability detection, malware analysis, network intrusion detection, and phishing detection. Second, we find that the datasets used for training and evaluating LLMs in these tasks are often limited in size and diversity, highlighting the need for more comprehensive and representative datasets. Third, we identify several promising techniques for adapting LLMs to specific cybersecurity domains, such as fine-tuning, transfer learning, and domain-specific pre-training. Finally, we discuss the main challenges and opportunities for future research in LLM4Security, including the need for more interpretable and explainable models, the importance of addressing data privacy and security concerns, and the potential for leveraging LLMs for proactive defense and threat hunting. Overall, our survey provides a comprehensive overview of the current state-of-the-art in LLM4Security and identifies several promising directions for future research.

翻译：大语言模型（LLMs）的快速发展为人工智能在网络安全等多个领域的应用开辟了新机遇。随着网络威胁的数量与复杂性持续增长，对能够自动检测漏洞、分析恶意软件及响应攻击的智能系统的需求日益迫切。本综述对LLMs在网络安全中的应用（LLM4Security）相关文献进行了全面梳理。通过广泛收集超过3万篇相关论文，并系统分析来自顶级安全与软件工程会议的127篇论文，我们旨在全方位揭示LLMs如何被用于解决网络安全领域的多样化问题。通过分析，我们得出若干关键发现：首先，LLMs已广泛应用于包括漏洞检测、恶意软件分析、网络入侵检测及钓鱼检测在内的多种网络安全任务；其次，用于训练和评估这些LLMs的数据集在规模和多样性上往往有限，凸显了构建更全面、更具代表性数据集的必要性；第三，我们识别出几种将LLMs适配至特定网络安全领域的有效技术，例如微调、迁移学习及领域预训练；最后，我们探讨了LLM4Security未来研究的主要挑战与机遇，包括对更可解释模型的需求、解决数据隐私与安全问题的紧迫性，以及利用LLMs进行主动防御与威胁追踪的潜力。总体而言，本综述为LLM4Security的现有技术水准提供了全面概述，并明确了若干未来研究的有前景方向。