Recently, Large Language Models (LLMs) have shown impressive results in code generation. However, existing decoding strategies are designed for Natural Language (NL) generation, overlooking the differences between NL and programming languages (PL). Due to this oversight, a better decoding strategy for code generation remains an open question. In this paper, we conduct the first systematic study to explore a decoding strategy specialized in code generation. With an analysis of loss distributions of code tokens, we find that code tokens can be divided into two categories: challenging tokens that are difficult to predict and confident tokens that can be easily inferred. Among them, the challenging tokens mainly appear at the beginning of a code block. Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling, which dynamically adjusts the temperature coefficient when decoding different tokens. We apply a larger temperature when sampling for challenging tokens, allowing LLMs to explore diverse choices. We employ a smaller temperature for confident tokens avoiding the influence of tail randomness noises. We apply AdapT sampling to LLMs with different sizes and conduct evaluations on two popular datasets. Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.
翻译:最近,大型语言模型(LLMs)在代码生成方面取得了显著成果。然而,现有的解码策略专为自然语言(NL)生成设计,忽视了NL与编程语言(PL)之间的差异。由于这一忽视,针对代码生成的更优解码策略仍是一个开放性问题。本文首次系统性地探索了专用于代码生成的解码策略。通过对代码令牌的损失分布进行分析,我们发现代码令牌可分为两类:难以预测的挑战性令牌和易于推断的置信令牌。其中,挑战性令牌主要出现在代码块的开头。受上述发现启发,我们提出了一种简单有效的方法:自适应温度(AdapT)采样,该方法在解码不同令牌时动态调整温度系数。我们对挑战性令牌采样时采用较高温度,使LLMs能够探索多样化的选择;对置信令牌则采用较低温度,以避免尾部随机噪声的影响。我们将AdapT采样应用于不同规模的LLMs,并在两个流行数据集上进行了评估。结果表明,AdapT采样显著优于最先进的解码策略。