This paper explores the value of agentic AI tools for cybersecurity purposes. We evaluate the efficacy of a general-purpose GenAI Large Language Model- (GenAI-) based agent when powered by three different Ollama-hosted general-purpose open source models. We assess each agent's performance using precision, recall, false positive count, and a calculated composite score based upon the interplay of the captured metrics, against the baseline performance of an existing, vetted Static Application Security Testing (SAST) tool, Bandit. Our findings refute the notion that a modern open-source GenAI LLM-based agent is currently suitable for the specialized task of SAST scanning under realistic conditions.
翻译:本文探讨了自主式人工智能工具在网络安全领域的应用价值。我们评估了基于通用生成式大语言模型(GenAI)的代理在三种由Ollama托管的通用开源模型驱动下的效能。通过精确率、召回率、误报数量以及基于捕获指标间相互作用计算得出的综合评分,我们将其性能与经过验证的静态应用安全测试工具Bandit的基准性能进行了对比。研究结果否定了"现代开源GenAI大语言模型代理在当前现实条件下能够胜任SAST扫描这一专业任务"的观点。