GPT-5, a state of the art large language model from OpenAI, demonstrates strong performance in widely used programming languages such as Python, C++, and Java; however, its ability to operate in low resource or less commonly used languages remains underexplored. This work investigates whether GPT-5 can effectively acquire proficiency in an unfamiliar functional programming language, Idris, through iterative, feedback driven prompting. We first establish a baseline showing that with zero shot prompting the model solves only 22 out of 56 Idris exercises using the platform Exercism, substantially underperforming relative to higher resource languages (45 out of 50 in Python and 35 out of 47 in Erlang). We then evaluate several refinement strategies, including iterative prompting based on platform feedback, augmenting prompts with documentation and error classification guides, and iterative prompting using local compilation errors and failed test cases. Among these approaches, incorporating local compilation errors yields the most substantial improvements. Using this structured, error guided refinement loop, GPT-5 performance increased to an impressive 54 solved problems out of 56. These results suggest that while large language models may initially struggle in low resource settings, structured compiler level feedback can play a critical role in unlocking their capabilities.
翻译:GPT-5作为OpenAI推出的前沿大型语言模型,在Python、C++和Java等广泛使用的编程语言中展现出卓越性能;然而,其在低资源或非主流语言中的能力仍未得到充分探索。本研究通过迭代式、反馈驱动的提示策略,探究GPT-5能否有效掌握陌生的函数式编程语言Idris。我们首先建立基线测试:在零样本提示下,该模型仅能解决Exercism平台上56道Idris习题中的22道,显著落后于高资源语言表现(Python为50题中的45题,Erlang为47题中的35题)。随后我们评估了多种优化策略,包括基于平台反馈的迭代提示、补充文档与错误分类指南的增强提示,以及利用本地编译错误和失败测试用例的迭代提示。在这些方法中,整合本地编译错误带来了最显著的改进。通过这种结构化的错误引导优化循环,GPT-5的解题能力提升至56题中的54题。这些结果表明,尽管大型语言模型在低资源环境中可能面临初始困难,但结构化的编译器层级反馈对于释放其潜力具有关键作用。