This paper evaluates the capability of two state-of-the-art artificial intelligence (AI) models, GPT-3.5 and Bard, in generating Java code given a function description. We sourced the descriptions from CodingBat.com, a popular online platform that provides practice problems to learn programming. We compared the Java code generated by both models based on correctness, verified through the platform's own test cases. The results indicate clear differences in the capabilities of the two models. GPT-3.5 demonstrated superior performance, generating correct code for approximately 90.6% of the function descriptions, whereas Bard produced correct code for 53.1% of the functions. While both models exhibited strengths and weaknesses, these findings suggest potential avenues for the development and refinement of more advanced AI-assisted code generation tools. The study underlines the potential of AI in automating and supporting aspects of software development, although further research is required to fully realize this potential.
翻译:本文评估了两种最先进的人工智能模型——GPT-3.5与Bard——在给定函数描述时生成Java代码的能力。我们从编程学习平台CodingBat.com(一个提供编程练习题的流行在线平台)中获取函数描述。通过该平台自带的测试用例验证,我们比较了两个模型生成的Java代码的正确性。结果表明两个模型的能力存在显著差异。GPT-3.5表现出更优的性能,能够为约90.6%的函数描述生成正确代码,而Bard仅为53.1%的函数生成了正确代码。尽管两个模型各有优劣,这些发现为开发和改进更先进的AI辅助代码生成工具指明了潜在方向。本研究强调了人工智能在实现软件自动化开发及提供辅助支持方面的潜力,但该潜力的完全实现仍需进一步研究。