User-Centric Evaluation of ChatGPT Capability of Generating R Program Code

This paper reports an evaluation of ChatGPT's capability of generating R programming language code from natural language input. A dataset specially designed for generating R program code was constructed with metadata to support scenario-based testing and evaluation of code generation capabilities in various usage scenarios of different levels of difficulty and different types of programs. The evaluation takes a multiple attempt process in which the tester tries to complete the code generation task through a number of attempts until a satisfactory solution is obtained or gives up after a fixed number of maximal attempts. In each attempt the tester formulates a natural language input to ChatGPT based on the previous results and the task to be completed. In addition to the metrics of average numbers of attempts and average amount of time taken to complete the tasks, the final generated solutions are then assessed on a number of quality attributes, including accuracy, completeness, conciseness, readability, well structuredness, logic clarity, depth of ex-planation, and coverage of parameters. Our experiments demonstrated that ChatGPT is in general highly capable of generating high quality R program code as well as textual explanations although it may fail on hard programming tasks. The experiment data also shows that human developers can hardly learn from experiences naturally to improve the skill of using ChatGPT to generate code.

翻译：本文报告了对ChatGPT从自然语言输入生成R编程语言代码能力的评估。研究构建了一个专门用于生成R程序代码的数据集，并附带元数据，以支持在不同难度级别和不同类型程序的各种使用场景下对代码生成能力进行基于场景的测试与评估。评估采用多次尝试的过程，测试者通过多次尝试完成代码生成任务，直至获得满意解决方案，或在达到固定最大尝试次数后放弃。每次尝试中，测试者根据先前结果及待完成任务，向ChatGPT输入自然语言指令。除平均尝试次数和平均完成任务所需时间等指标外，最终生成的解决方案还需从多个质量属性进行评估，包括准确性、完整性、简洁性、可读性、结构合理性、逻辑清晰度、解释深度及参数覆盖度。实验表明，ChatGPT总体具备生成高质量R程序代码及文本解释的强大能力，尽管在处理困难编程任务时可能失败。实验数据还显示，人类开发者难以自然地从经验中学习提升使用ChatGPT生成代码的技能。

相关内容

ChatGPT

关注 258

ChatGPT（全名：Chat Generative Pre-trained Transformer），美国OpenAI 研发的聊天机器人程序 [1] ，于2022年11月30日发布。ChatGPT是人工智能技术驱动的自然语言处理工具，它能够通过学习和理解人类的语言来进行对话，还能根据聊天的上下文进行互动，真正像人类一样来聊天交流，甚至能完成撰写邮件、视频脚本、文案、翻译、代码，写论文任务。 [1] https://openai.com/blog/chatgpt/

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日