LLMHoney：一种基于大语言模型动态响应生成的实时SSH蜜罐 (LLMHoney: A Real-Time SSH Honeypot with Large Language Model-Driven Dynamic Response Generation)

Cybersecurity honeypots are deception tools for engaging attackers and gather intelligence, but traditional low or medium-interaction honeypots often rely on static, pre-scripted interactions that can be easily identified by skilled adversaries. This Report presents LLMHoney, an SSH honeypot that leverages Large Language Models (LLMs) to generate realistic, dynamic command outputs in real time. LLMHoney integrates a dictionary-based virtual file system to handle common commands with low latency while using LLMs for novel inputs, achieving a balance between authenticity and performance. We implemented LLMHoney using open-source LLMs and evaluated it on a testbed with 138 representative Linux commands. We report comprehensive metrics including accuracy (exact-match, Cosine Similarity, Jaro-Winkler Similarity, Levenshtein Similarity and BLEU score), response latency and memory overhead. We evaluate LLMHoney using multiple LLM backends ranging from 0.36B to 3.8B parameters, including both open-source models and a proprietary model(Gemini). Our experiments compare 13 different LLM variants; results show that Gemini-2.0 and moderately-sized models Qwen2.5:1.5B and Phi3:3.8B provide the most reliable and accurate responses, with mean latencies around 3 seconds, whereas smaller models often produce incorrect or out-of-character outputs. We also discuss how LLM integration improves honeypot realism and adaptability compared to traditional honeypots, as well as challenges such as occasional hallucinated outputs and increased resource usage. Our findings demonstrate that LLM-driven honeypots are a promising approach to enhance attacker engagement and collect richer threat intelligence.

翻译：网络安全蜜罐是用于吸引攻击者并收集情报的欺骗性工具，但传统的中低交互蜜罐通常依赖静态的预编写交互，容易被熟练的攻击者识别。本报告提出了LLMHoney，一种利用大语言模型实时生成逼真动态命令输出的SSH蜜罐。LLMHoney集成了基于字典的虚拟文件系统，以低延迟处理常见命令，同时使用大语言模型处理新颖输入，从而在真实性与性能之间取得平衡。我们使用开源大语言模型实现了LLMHoney，并在包含138个代表性Linux命令的测试平台上对其进行了评估。我们报告了包括准确性（精确匹配、余弦相似度、Jaro-Winkler相似度、Levenshtein相似度和BLEU分数）、响应延迟和内存开销在内的综合指标。我们使用参数规模从0.36B到3.8B的多种大语言模型后端（包括开源模型和专有模型Gemini）评估了LLMHoney。我们的实验比较了13种不同的大语言模型变体；结果表明，Gemini-2.0以及中等规模的模型Qwen2.5:1.5B和Phi3:3.8B提供了最可靠和准确的响应，平均延迟约为3秒，而较小模型常产生错误或不符合预期的输出。我们还讨论了大语言模型集成相较于传统蜜罐如何提升蜜罐的真实感和适应性，以及诸如偶尔产生的幻觉输出和资源使用增加等挑战。我们的研究结果表明，大语言模型驱动的蜜罐是增强攻击者参与度和收集更丰富威胁情报的一种有前景的方法。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/