Background: Recently, code generation tools such as ChatGPT have drawn attention to their performance. Generally, a prior analysis of their performance is needed to select new code-generation tools from a list of candidates. Without such analysis, there is a higher risk of selecting an ineffective tool, negatively affecting software development productivity. Additionally, conducting prior analysis of new code generation tools takes time and effort. Aim: To use a new code generation tool without prior analysis but with low risk, we propose to evaluate the new tools during software development (i.e., online optimization). Method: We apply the bandit algorithm (BA) approach to help select the best code-generation tool among candidates. Developers evaluate whether the result of the tool is correct or not. When code generation and evaluation are repeated, the evaluation results are saved. We utilize the stored evaluation results to select the best tool based on the BA approach. Our preliminary analysis evaluated five code generation tools with 164 code generation cases using BA. Result: The BA approach selected ChatGPT as the best tool as the evaluation proceeded, and during the evaluation, the average accuracy by the BA approach outperformed the second-best performing tool. Our results reveal the feasibility and effectiveness of BA in assisting the selection of best-performing code generation tools.
翻译:背景:近期,诸如ChatGPT等代码生成工具的性能备受关注。通常,从候选工具中选择新代码生成工具需要预先分析其性能。若无此类分析,则存在选择低效工具的较高风险,从而对软件开发效率产生负面影响。此外,对新型代码生成工具进行预先分析需要耗费时间和精力。目的:为在不进行预先分析的情况下使用新型代码生成工具且降低风险,我们提出在软件开发过程中对新型工具进行在线优化评估。方法:采用赌博机算法(BA)方法辅助从候选工具中选择最优代码生成工具。由开发者评估工具生成结果的正确性,在重复进行代码生成与评估过程中保存评估结果。我们利用存储的评估结果基于BA方法选择最优工具。初步分析中,我们使用BA方法对164个代码生成案例中的五种代码生成工具进行了评估。结果:随着评估进程推进,BA方法将ChatGPT选为最优工具,且在评估期间,BA方法的平均准确率优于次优工具的性能。研究结果揭示了BA方法在辅助选择性能最优的代码生成工具方面的可行性和有效性。