There are very few SotA deception systems based on Large Language Models. The existing ones are limited only to simulating one type of service, mainly SSH shells. These systems - but also the deception technologies not based on LLMs - lack an extensive evaluation that includes human attackers. Generative AI has recently become a valuable asset for cybersecurity researchers and practitioners, and the field of cyber-deception is no exception. Researchers have demonstrated how LLMs can be leveraged to create realistic-looking honeytokens, fake users, and even simulated systems that can be used as honeypots. This paper presents an AI-based deception framework called VelLMes, which can simulate multiple protocols and services such as SSH Linux shell, MySQL, POP3, and HTTP. All of these can be deployed and used as honeypots, thus VelLMes offers a variety of choices for deception design based on the users' needs. VelLMes is designed to be attacked by humans, so interactivity and realism are key for its performance. We evaluate the generative capabilities and the deception capabilities. Generative capabilities were evaluated using unit tests for LLMs. The results of the unit tests show that, with careful prompting, LLMs can produce realistic-looking responses, with some LLMs having a 100% passing rate. In the case of the SSH Linux shell, we evaluated deception capabilities with 89 human attackers. The results showed that about 30% of the attackers thought that they were interacting with a real system when they were assigned an LLM-based honeypot. Lastly, we deployed 10 instances of the SSH Linux shell honeypot on the Internet to capture real-life attacks. Analysis of these attacks showed us that LLM honeypots simulating Linux shells can perform well against unstructured and unexpected attacks on the Internet, responding correctly to most of the issued commands.
翻译:目前基于大型语言模型(LLM)的先进欺骗系统非常稀少。现有系统仅限于模拟单一类型的服务,主要是SSH shell。这些系统——以及非基于LLM的欺骗技术——均缺乏包含人类攻击者参与的广泛评估。生成式人工智能近期已成为网络安全研究人员和实践者的宝贵资产,网络欺骗领域也不例外。研究人员已证明如何利用LLM创建逼真的蜜标、虚假用户乃至可作为蜜罐使用的模拟系统。本文提出了一种名为VelLMes的基于人工智能的欺骗框架,该框架能够模拟多种协议与服务,例如SSH Linux shell、MySQL、POP3和HTTP。所有这些均可作为蜜罐部署和使用,因此VelLMes基于用户需求为欺骗设计提供了多样化选择。VelLMes设计用于承受人类攻击,因此交互性与真实性是其性能的关键。我们评估了其生成能力与欺骗能力。生成能力通过针对LLM的单元测试进行评估。单元测试结果表明,通过精心设计的提示,LLM能够生成逼真的响应,部分LLM的通过率达到100%。在SSH Linux shell的案例中,我们使用89名人类攻击者评估了欺骗能力。结果显示,约30%的攻击者在被分配基于LLM的蜜罐时,认为自己在与真实系统交互。最后,我们在互联网上部署了10个SSH Linux shell蜜罐实例以捕获真实攻击。对这些攻击的分析表明,模拟Linux shell的LLM蜜罐能够有效应对互联网上非结构化与意外攻击,对大多数输入命令作出正确响应。