Recently, Large Language Models (LLMs) have shown impressive abilities in code generation. However, existing LLMs' decoding strategies are designed for Natural Language (NL) generation, overlooking the differences between NL and programming languages (PL). Due to this oversight, a better decoding strategy for code generation remains an open question. In this paper, we conduct the first systematic study to explore a decoding strategy specialized in code generation. With an analysis of loss distributions of code tokens, we find that code tokens can be divided into two categories: challenging tokens that are difficult to predict and confident tokens that can be easily inferred. Among them, the challenging tokens mainly appear at the beginning of a code block. Inspired by the above findings, we propose a simple yet effective method: Adaptive Temperature (AdapT) sampling, which dynamically adjusts the temperature coefficient when decoding different tokens. We apply a larger temperature when sampling for challenging tokens, allowing LLMs to explore diverse choices. We employ a smaller temperature for confident tokens avoiding the influence of tail randomness noises. We apply AdapT sampling to LLMs with different sizes and conduct evaluations on two popular datasets. Results show that AdapT sampling significantly outperforms state-of-the-art decoding strategy.
翻译:近期,大语言模型(LLMs)在代码生成方面展现出令人瞩目的能力。然而,现有LLMs的解码策略是为自然语言(NL)生成设计的,忽略了自然语言与编程语言(PL)之间的差异。由于这一忽略,针对代码生成更优的解码策略仍是一个开放性问题。本文首次系统研究了专用于代码生成的解码策略。通过分析代码令牌的损失分布,我们发现代码令牌可分为两类:难以预测的挑战性令牌和易于推断的置信令牌。其中,挑战性令牌主要出现在代码块起始位置。受上述发现启发,我们提出了一种简单高效的方法——自适应温度(AdapT)采样。该方法在解码不同令牌时动态调整温度系数。对于挑战性令牌的采样,我们采用更高温度,使LLMs能探索更多样化选择;对于置信令牌则采用更低温度,以避免尾部随机噪声的影响。我们将AdapT采样应用于不同规模的LLMs,并在两个主流数据集上进行评估。结果表明,AdapT采样的性能显著优于现有最优解码策略。