Generative AI models, specifically large language models (LLMs), have made strides towards the long-standing goal of text-to-code generation. This progress has invited numerous studies of user interaction. However, less is known about the struggles and strategies of non-experts, for whom each step of the text-to-code problem presents challenges: describing their intent in natural language, evaluating the correctness of generated code, and editing prompts when the generated code is incorrect. This paper presents a large-scale controlled study of how 120 beginning coders across three academic institutions approach writing and editing prompts. A novel experimental design allows us to target specific steps in the text-to-code process and reveals that beginners struggle with writing and editing prompts, even for problems at their skill level and when correctness is automatically determined. Our mixed-methods evaluation provides insight into student processes and perceptions with key implications for non-expert Code LLM use within and outside of education.
翻译:生成式人工智能模型,特别是大语言模型(LLMs),在实现文本到代码生成的长期目标方面取得了显著进展。这一进展引发了大量关于用户交互的研究。然而,对于非专家用户在此过程中面临的困难与应对策略,我们知之甚少。对他们而言,文本到代码问题的每一步都充满挑战:用自然语言描述意图、评估生成代码的正确性,以及在生成代码不正确时修改提示。本文通过一项大规模对照研究,探讨了来自三所学术机构的120名编程初学者如何撰写和修改提示。新颖的实验设计使我们能够针对文本到代码流程中的特定环节,并揭示出初学者即使在处理符合其技能水平且正确性可自动判定的问题时,仍在撰写和修改提示方面存在困难。我们的混合方法评估深入剖析了学生的处理过程与认知,对教育内外非专家用户使用代码大语言模型具有重要启示。