Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising area is their ability to interpret code modules from unfamiliar libraries for solving user-instructed tasks. Recent work has shown that large proprietary LLMs can learn novel library usage in-context from demonstrations. These results raise several open questions: whether demonstrations of library usage is required, whether smaller (and more open) models also possess such capabilities, etc. In this work, we take a broader approach by systematically evaluating a diverse array of LLMs across three scenarios reflecting varying levels of domain specialization to understand their abilities and limitations in generating code based on libraries defined in-context. Our results show that even smaller open-source LLMs like Llama-2 and StarCoder demonstrate an adept understanding of novel code libraries based on specification presented in-context. Our findings further reveal that LLMs exhibit a surprisingly high proficiency in learning novel library modules even when provided with just natural language descriptions or raw code implementations of the functions, which are often cheaper to obtain than demonstrations. Overall, our results pave the way for harnessing LLMs in more adaptable and dynamic coding environments.
翻译:当代大型语言模型(LLMs)展现出高度的代码生成与理解能力。一个尤为值得关注的领域在于其能够解释来自陌生库的代码模块,以解决用户指定的任务。近期研究表明,大型专有LLMs可从演示中通过上下文学习掌握新颖的库用法。这些成果引发若干开放问题:库用法的演示是否必需?更小型(且更开放)的模型是否也具备此类能力?本研究采用更广泛的视角,系统评估了多种LLMs在三个反映不同领域专业化水平场景下的表现,以理解它们在基于上下文定义的库生成代码方面的能力与局限性。我们的结果表明,即便是更小型的开源LLMs(如Llama-2和StarCoder)也能基于上下文呈现的规范展现对新颖代码库的敏锐理解。研究进一步揭示,即使仅提供函数的自然语言描述或原始代码实现(其获取成本通常低于演示),LLMs仍能展现出令人惊讶的高水平学习新颖库模块的能力。总体而言,我们的研究为在更具适应性和动态性的编码环境中利用LLMs铺平了道路。