Contemporary Large Language Models (LLMs) exhibit a high degree of code generation and comprehension capability. A particularly promising area is their ability to interpret code modules from unfamiliar libraries for solving user-instructed tasks. Recent work has shown that large proprietary LLMs can learn novel library usage in-context from demonstrations. These results raise several open questions: whether demonstrations of library usage is required, whether smaller (and more open) models also possess such capabilities, etc. In this work, we take a broader approach by systematically evaluating a diverse array of LLMs across three scenarios reflecting varying levels of domain specialization to understand their abilities and limitations in generating code based on libraries defined in-context. Our results show that even smaller open-source LLMs like Llama-2 and StarCoder demonstrate an adept understanding of novel code libraries based on specification presented in-context. Our findings further reveal that LLMs exhibit a surprisingly high proficiency in learning novel library modules even when provided with just natural language descriptions or raw code implementations of the functions, which are often cheaper to obtain than demonstrations. Overall, our results pave the way for harnessing LLMs in more adaptable and dynamic coding environments.
翻译:当代大型语言模型(LLMs)展现出高度的代码生成与理解能力。其中特别值得关注的领域是,它们能够解读来自陌生库的代码模块,从而解决用户指定的任务。近期研究表明,大型专有LLMs可通过演示示例在上下文中学习新型库的使用方法。这些结果提出了若干开放性问题:库使用的演示是否必要?更小(且更开放)的模型是否也具备此类能力?在本研究中,我们采用更广泛的视角,系统评估了多种LLMs在三个反映不同领域专业化程度场景下的表现,以理解其在基于上下文定义的库进行代码生成时的能力与局限性。结果显示,即使是较小的开源LLMs(如Llama-2和StarCoder),也能基于上下文呈现的规范展现出对新型代码库的深刻理解。进一步发现表明,即使仅提供函数的自然语言描述或原始代码实现(这些数据通常比演示示例更易获取),LLMs也能出人意料地高效学习新型库模块。总体而言,我们的研究为在更具适应性和动态性的编码环境中利用LLMs奠定了基础。