We introduce ARACNE, a fully autonomous LLM-based pentesting agent tailored for SSH services that can execute commands on real Linux shell systems. Introduces a new agent architecture with multi-LLM model support. Experiments show that ARACNE can reach a 60\% success rate against the autonomous defender ShelLM and a 57.58\% success rate against the Over The Wire Bandit CTF challenges, improving over the state-of-the-art. When winning, the average number of actions taken by the agent to accomplish the goals was less than 5. The results show that the use of multi-LLM is a promising approach to increase accuracy in the actions.
翻译:本文介绍ARACNE,一种专为SSH服务设计的完全自主的基于大语言模型的渗透测试代理,能够在真实的Linux Shell系统中执行命令。该研究提出了一种支持多LLM模型的新型代理架构。实验表明,ARACNE在对抗自主防御器ShelLM时达到60%的成功率,在Over The Wire Bandit CTF挑战中达到57.58%的成功率,性能优于现有最佳方法。在成功完成任务时,代理达成目标所需的平均动作次数少于5次。结果表明,采用多LLM模型是提高动作准确性的有效途径。