APILOT: Navigating Large Language Models to Generate Secure Code by Sidestepping Outdated API Pitfalls

With the rapid development of large language models (LLMs), their applications have expanded into diverse fields, such as code assistance. However, the substantial size of LLMs makes their training highly resource- and time-intensive, rendering frequent retraining or updates impractical. Consequently, time-sensitive data can become outdated, potentially misleading LLMs in time-aware tasks. For example, new vulnerabilities are discovered in various programs every day. Without updating their knowledge, LLMs may inadvertently generate code that includes these newly discovered vulnerabilities. Current strategies, such as prompt engineering and fine-tuning, do not effectively address this issue. To address this issue, we propose solution, named APILOT, which maintains a realtime, quickly updatable dataset of outdated APIs. Additionally, APILOT utilizes an augmented generation method that leverages this dataset to navigate LLMs in generating secure, version-aware code. We conducted a comprehensive evaluation to measure the effectiveness of APILOT in reducing the incidence of outdated API recommendations across seven different state-of-the-art LLMs. The evaluation results indicate that APILOT can reduce outdated code recommendations by 89.42% on average with limited performance overhead. Interestingly, while enhancing security, APILOT also improves the usability of the code generated by LLMs, showing an average increase of 27.54% in usability. This underscores APILOT's dual capability to enhance both the safety and practical utility of code suggestions in contemporary software development environments.

翻译：随着大语言模型（LLMs）的快速发展，其应用已扩展到代码辅助等多个领域。然而，大语言模型的庞大规模使其训练过程高度依赖资源且耗时，导致频繁的重新训练或更新难以实现。因此，具有时效性的数据可能变得过时，从而在时间敏感型任务中误导大语言模型。例如，各类程序中每天都有新的漏洞被发现。若未更新其知识，大语言模型可能无意中生成包含这些新发现漏洞的代码。当前的策略，如提示工程和微调，未能有效解决此问题。为解决这一问题，我们提出了名为APILOT的解决方案，该方案维护一个实时、可快速更新的过时API数据集。此外，APILOT采用一种增强生成方法，利用该数据集引导大语言模型生成安全且版本感知的代码。我们进行了全面评估，以衡量APILOT在七种不同最先进大语言模型中减少过时API推荐的有效性。评估结果表明，APILOT在有限性能开销下，平均可将过时代码推荐减少89.42%。有趣的是，在增强安全性的同时，APILOT还提升了大语言模型生成代码的可用性，显示可用性平均提高了27.54%。这突显了APILOT在当代软件开发环境中，既能提升代码建议的安全性，又能增强其实用性的双重能力。