Large language Models (LLMs) have shown remarkable proficiency in code generation tasks across various programming languages. However, their outputs often contain subtle but critical vulnerabilities, posing significant risks when deployed in security-sensitive or mission-critical systems. This paper introduces TypePilot, an agentic AI framework designed to enhance the security and robustness of LLM-generated code by leveraging strongly typed and verifiable languages, using Scala as a representative example. We evaluate the effectiveness of our approach in two settings: formal verification with the Stainless framework and general-purpose secure code generation. Our experiments with leading open-source LLMs reveal that while direct code generation often fails to enforce safety constraints, just as naive prompting for more secure code, our type-focused agentic pipeline substantially mitigates input validation and injection vulnerabilities. The results demonstrate the potential of structured, type-guided LLM workflows to improve the SotA of the trustworthiness of automated code generation in high-assurance domains.
翻译:大型语言模型(LLM)在多种编程语言的代码生成任务中展现出卓越能力。然而,其输出常包含微妙但关键的安全漏洞,在部署于安全敏感或任务关键型系统时构成重大风险。本文提出TypePilot,一种基于代理的AI框架,旨在通过利用强类型可验证语言(以Scala为代表)来增强LLM生成代码的安全性与鲁棒性。我们在两种场景下评估该方法的有效性:基于Stainless框架的形式化验证,以及通用安全代码生成。通过对主流开源LLM的实验发现,直接代码生成常无法满足安全约束,单纯提示生成更安全代码同样效果有限,而我们的类型导向代理流程能显著缓解输入验证与注入类漏洞。结果表明,结构化、类型引导的LLM工作流在提升高可信领域自动化代码生成的可信度方面具有推动技术前沿的潜力。