Large language models (LLMs) have achieved impressive performance in code generation. However, due to the long-tail distribution of LLMs' training data, low-frequency terms are typically underrepresented in the training process. Consequently, LLMs often misunderstand or overlook problem-specific, low-frequency keywords during code generation, compromising the accuracy of the generated code. To address this, we propose a novel technique named SEK(\textbf{S}elf-\textbf{E}xplained \textbf{K}eywords), which empowers an LLM for better code generation by extracting and explaining the key terms in the problem description with the LLM itself and ranking them based on frequency. Comprehensive experiments across three benchmarks, i.e., HumanEval(+), MBPP(+), and APPS, with five representative LLMs, show that SEK can significantly improve LLMs in code generation, yielding substantial and consistent gains. For instance, SEK improves the Pass@1 of DeepSeek-Coder-V2-Instruct from 85.4\% to 93.3\% on the Humaneval benchmark. Further analysis confirms that SEK enables the LLMs to shift their attention from low-frequency keywords to their corresponding high-frequency counterparts.
翻译:大语言模型(LLMs)在代码生成方面取得了令人瞩目的性能。然而,由于LLMs训练数据的长尾分布,低频术语在训练过程中通常代表性不足。因此,LLMs在代码生成时经常误解或忽视问题特定的低频关键词,从而影响了生成代码的准确性。为解决此问题,我们提出了一种名为SEK(\textbf{S}elf-\textbf{E}xplained \textbf{K}eywords,自解释关键词)的新技术,该技术通过利用LLM自身提取并解释问题描述中的关键术语,并根据词频对其进行排序,从而赋能LLM实现更好的代码生成。在HumanEval(+)、MBPP(+)和APPS三个基准测试上,使用五个代表性LLM进行的综合实验表明,SEK能显著提升LLMs的代码生成能力,带来显著且一致的性能增益。例如,在HumanEval基准测试中,SEK将DeepSeek-Coder-V2-Instruct的Pass@1从85.4\%提升至93.3\%。进一步的分析证实,SEK能够使LLMs将注意力从低频关键词转移到其对应的高频同义词上。