The advent of Large Language Models (LLM) has revolutionized the efficiency and speed with which tasks are completed, marking a significant leap in productivity through technological innovation. As these chatbots tackle increasingly complex tasks, the challenge of assessing the quality of their outputs has become paramount. This paper critically examines the output quality of two leading LLMs, OpenAI's ChatGPT and Google's Gemini AI, by comparing the quality of programming code generated in both their free versions. Through the lens of a real-world example coupled with a systematic dataset, we investigate the code quality produced by these LLMs. Given their notable proficiency in code generation, this aspect of chatbot capability presents a particularly compelling area for analysis. Furthermore, the complexity of programming code often escalates to levels where its verification becomes a formidable task, underscoring the importance of our study. This research aims to shed light on the efficacy and reliability of LLMs in generating high-quality programming code, an endeavor that has significant implications for the field of software development and beyond.
翻译:大型语言模型(LLM)的出现彻底改变了任务完成的效率与速度,标志着通过技术创新实现生产力的重大飞跃。随着这些聊天机器人处理日益复杂的任务,评估其输出质量的挑战变得至关重要。本文通过比较OpenAI的ChatGPT与Google的Gemini AI两个领先LLM在其免费版本中生成的编程代码质量,对其输出质量进行了批判性审视。通过结合真实案例与系统化数据集的视角,我们深入探究了这些LLM生成的代码质量。鉴于它们在代码生成方面表现出的显著能力,聊天机器人的这一功能特性构成了特别值得分析的研究领域。此外,编程代码的复杂度常会提升至验证工作变得异常艰巨的程度,这进一步凸显了本研究的重要性。本研究旨在揭示LLM在生成高质量编程代码方面的效能与可靠性,这一探索对软件开发及其他相关领域具有深远意义。