Large Language Models (LLMs) are increasingly integrated into software development workflows, yet their behavior in structured, specification-driven processes remains poorly understood. This paper presents an empirical study design using CURRANTE, a Visual Studio Code extension that enables a human-in-the-loop workflow for LLM-assisted code generation. The tool guides developers through three sequential stages--Specification, Tests, and Function--allowing them to define requirements, generate and refine test suites, and produce functions that satisfy those tests. Participants will solve medium-difficulty problems from the LiveCodeBench dataset, while the tool records fine-grained interaction logs, effectiveness metrics (e.g., pass rate, all-pass completion), efficiency indicators (e.g., time-to-pass), and iteration behaviors. The study aims to analyze how human intervention in specification and test refinement influences the quality and dynamics of LLM-generated code. The results will provide empirical insights into the design of next-generation development environments that align human reasoning with model-driven code generation.
翻译:大型语言模型(LLM)正日益融入软件开发工作流,但其在结构化、基于规范的过程中的行为仍鲜为人知。本文提出了一项实证研究设计,该设计采用CURRANTE——一个Visual Studio Code扩展,它支持一种人在循环的工作流,用于LLM辅助的代码生成。该工具引导开发者依次完成三个阶段——规范、测试和函数——允许他们定义需求、生成并完善测试套件,以及产出满足这些测试的函数。参与者将解决来自LiveCodeBench数据集的中等难度问题,同时该工具会记录细粒度的交互日志、有效性指标(例如,通过率、全通过完成度)、效率指标(例如,首次通过时间)以及迭代行为。本研究旨在分析在规范和测试完善过程中的人为干预如何影响LLM生成代码的质量与动态过程。研究结果将为设计下一代开发环境提供实证见解,以促进人类推理与模型驱动代码生成的协同。