As Large Language Models (LLMs) are increasingly integrated into software development workflows, their trustworthiness has become a critical concern. However, in dependency recommendation scenarios, the reliability of LLMs is undermined by widespread package hallucinations, where models often recommend hallucinated packages. Recent studies have proposed a range of approaches to mitigate this issue. Nevertheless, existing approaches typically merely reduce hallucination rates rather than eliminate them, leaving persistent software security risks. In this work, we argue that package hallucinations are theoretically preventable based on the key insight that package validity is decidable through finite and enumerable authoritative package lists. Building on this, we propose PackMonitor, the first approach capable of fundamentally eliminating package hallucinations by continuously monitoring the model's decoding process and intervening when necessary. To implement this in practice, PackMonitor addresses three key challenges: (1) determining when to trigger intervention via a Context-Aware Parser that continuously monitors model outputs and selectively activates intervening only during installation command generation; (2) resolving how to intervene by employing a Package-Name Intervenor that strictly limits the decoding space to an authoritative package list; and (3) ensuring monitoring efficiency through a DFA-Caching Mechanism that enables scalability to millions of packages with negligible overhead. Extensive experiments on five widely used LLMs demonstrate that PackMonitor is a training-free, plug-and-play solution that consistently reduces package hallucination rates to zero while maintaining low-latency inference and preserving original model capabilities.
翻译:随着大型语言模型(LLMs)日益融入软件开发工作流,其可信度已成为关键问题。然而,在依赖推荐场景中,LLMs的可靠性受到普遍存在的软件包幻觉现象的破坏——模型经常推荐虚构的软件包。近期研究提出了多种缓解该问题的方法,但现有方案通常仅能降低幻觉率而非彻底消除,导致软件安全风险持续存在。本文基于“软件包有效性可通过有限且可枚举的权威软件包列表进行判定”这一关键洞见,论证了软件包幻觉在理论上可被预防。基于此,我们提出PackMonitor——首个能够通过持续监控模型解码过程并在必要时实施干预,从而从根本上消除软件包幻觉的方法。为实现该目标,PackMonitor解决了三个关键挑战:(1)通过上下文感知解析器持续监控模型输出,并仅在生成安装命令时选择性激活干预,以确定何时触发干预;(2)采用软件包名称干预器将解码空间严格限制在权威软件包列表内,以解决如何干预的问题;(3)通过确定性有限自动机缓存机制确保监控效率,使其可扩展至数百万软件包规模且开销可忽略。在五个广泛使用的LLMs上进行的大量实验表明,PackMonitor是一种无需训练、即插即用的解决方案,能够持续将软件包幻觉率降至零,同时保持低延迟推理并保留原始模型能力。