Recently, Large Language Models (LLMs) have shown impressive results in code generation. However, existing decoding strategies are designed for Natural Language (NL) generation, overlooking the differences between NL and programming languages (PL). Due to this oversight, a better decoding strategy for code generation remains an open question. In this paper, we conduct the first systematic study to explore a decoding strategy specialized in code generation. With an analysis of loss distributions of code tokens, we find that code tokens can be divided into two categories: challenging tokens that are difficult to predict and confident tokens that can be easily inferred. Among them, the challenging tokens mainly appear at the beginning of a code block. Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling, which dynamically adjusts the temperature coefficient when decoding different tokens. We apply a larger temperature when sampling for challenging tokens, allowing LLMs to explore diverse choices. We employ a smaller temperature for confident tokens avoiding the influence of tail randomness noises. We apply AdapT sampling to LLMs with different sizes and conduct evaluations on two popular datasets. Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.
翻译:近期,大型语言模型在代码生成方面展现了显著成果。然而,现有解码策略主要针对自然语言生成设计,忽略了自然语言与编程语言之间的差异。由于这一疏忽,针对代码生成的更优解码策略仍是一个待解决的问题。本文首次系统性地探索了专门用于代码生成的解码策略。通过对代码令牌损失分布的分析,我们发现代码令牌可分为两类:难以预测的挑战性令牌和易于推断的置信令牌。其中,挑战性令牌主要出现在代码块的开头。基于上述发现,我们提出了一种简单而有效的方法:自适应温度采样,该方法在解码不同令牌时动态调整温度系数。对挑战性令牌采样时采用较高温度,使大型语言模型能够探索多样化的选择;对置信令牌则使用较低温度,以避免尾部随机噪声的干扰。我们将自适应温度采样应用于不同参数规模的大型语言模型,并在两个广泛使用的数据集上进行了评估。结果表明,自适应温度采样的性能显著优于现有的最佳解码策略。