Large Language Models (LLMs) have become powerful tools for code generation, yet they remain prone to hallucinations-producing plausible but incorrect or fabricated outputs. Among these, package hallucination, where an LLM suggests non-existent dependencies, poses an emerging security risk to the software supply chain. While previous studies focus on popular languages like Python or JavaScript, in this work we present the first large-scale empirical study on crate hallucination in LLM-generated Rust code. We construct a multi-source dataset combining coding tasks from Stack Overflow, GitHub, and LLM-generated tasks, and evaluate both commercial and open-source models under various decoding settings. Our analysis reveals that, unlike prior findings in Python and JavaScript, hallucination behavior in Rust follows a distinct pattern: different models exhibit surprisingly consistent hallucination rates, and these rates show minimal sensitivity to model parameters. Furthermore, we investigate prompt engineering strategies to mitigate hallucinations without sacrificing code quality. This study provides new insights into the reliability and security implications of LLM-assisted Rust development, offering guidance for future research and safer model deployment in software engineering workflows.
翻译:大语言模型(LLMs)已成为强大的代码生成工具,但它们仍然容易产生幻觉——即生成看似合理但实际错误或虚构的输出。其中,包幻觉(即 LLM 建议不存在的依赖项)对软件供应链构成了新兴的安全风险。尽管以往的研究侧重于 Python 或 JavaScript 等流行语言,但本研究首次对 LLM 生成的 Rust 代码中的包幻觉进行了大规模实证研究。我们构建了一个多源数据集,结合了来自 Stack Overflow、GitHub 的编码任务以及 LLM 生成的任务,并在多种解码设置下评估了商业和开源模型。我们的分析揭示,与先前在 Python 和 JavaScript 中的发现不同,Rust 中的幻觉行为遵循一种独特的模式:不同模型展现出惊人一致的幻觉率,且这些比率对模型参数的敏感性极低。此外,我们研究了在不牺牲代码质量的前提下缓解幻觉的提示工程策略。本研究为 LLM 辅助 Rust 开发的可靠性与安全性影响提供了新见解,为软件工程工作流中的未来研究和更安全的模型部署提供了指导。