Large-scale language models (LLMs) have emerged as a groundbreaking innovation in the realm of question-answering and conversational agents. These models, leveraging different deep learning architectures such as Transformers, are trained on vast corpora to predict sentences based on given queries. Among these LLMs, ChatGPT, developed by OpenAI, has ushered in a new era by utilizing artificial intelligence (AI) to tackle diverse problem domains, ranging from composing essays and biographies to solving intricate mathematical integrals. The versatile applications enabled by ChatGPT offer immense value to users. However, assessing the performance of ChatGPT's output poses a challenge, particularly in scenarios where queries lack clear objective criteria for correctness. For instance, evaluating the quality of generated essays becomes arduous and relies heavily on manual labor, in stark contrast to evaluating solutions to well-defined, closed-ended questions such as mathematical problems. This research paper delves into the efficacy of ChatGPT in solving programming problems, examining both the correctness and the efficiency of its solution in terms of time and memory complexity. The research reveals a commendable overall success rate of 71.875\%, denoting the proportion of problems for which ChatGPT was able to provide correct solutions that successfully satisfied all the test cases present in Leetcode. It exhibits strengths in structured problems and shows a linear correlation between its success rate and problem acceptance rates. However, it struggles to improve solutions based on feedback, pointing to potential shortcomings in debugging tasks. These findings provide a compact yet insightful glimpse into ChatGPT's capabilities and areas for improvement.
翻译:大型语言模型作为问答系统和对话代理领域的突破性创新而崭露头角。这些模型利用Transformer等不同的深度学习架构,在海量语料库上进行训练,以根据给定的查询预测句子。在这些大型语言模型中,由OpenAI开发的ChatGPT开启了新时代,它利用人工智能解决从撰写论文和传记到求解复杂数学积分等多样化的领域问题。ChatGPT所实现的广泛应用为用户提供了巨大价值。然而,评估ChatGPT输出的性能是一项挑战,尤其是在查询缺乏明确客观正确性标准的场景中。例如,评估生成论文的质量变得困难且严重依赖人工劳动,这与评估数学问题等定义明确的封闭式问题的解决方案形成鲜明对比。本研究论文深入探讨了ChatGPT在解决编程问题方面的效能,从解的正确性以及时间和空间复杂度方面的效率两个维度进行了考察。研究显示,其整体成功率达到令人称赞的71.875%,即ChatGPT能够提供满足LeetCode所有测试用例的正确解决方案的问题比例。它在结构性问题中表现出优势,其成功率与问题的接受率呈线性相关。然而,它在基于反馈改进解决方案方面存在困难,指出了在调试任务中的潜在不足。这些发现为理解ChatGPT的能力和待改进之处提供了简洁而富有洞见的视角。