AI powered code-recommendation systems, such as Copilot and CodeWhisperer, provide code suggestions inside a programmer's environment (e.g., an IDE) with the aim to improve their productivity. Since, in these scenarios, programmers accept and reject suggestions, ideally, such a system should use this feedback in furtherance of this goal. In this work we leverage prior data of programmers interacting with Copilot to develop interventions that can save programmer time. We propose a utility theory framework, which models this interaction with programmers and decides when and which suggestions to display. Our framework Conditional suggestion Display from Human Feedback (CDHF) is based on predictive models of programmer actions. Using data from 535 programmers we build models that predict the likelihood of suggestion acceptance. In a retrospective evaluation on real-world programming tasks solved with AI-assisted programming, we find that CDHF can achieve favorable tradeoffs. Our findings show the promise of integrating human feedback to improve interaction with large language models in scenarios such as programming and possibly writing tasks.
翻译:AI驱动的代码推荐系统(如Copilot和CodeWhisperer)能在程序员的工作环境(如集成开发环境IDE)中提供代码建议,旨在提升其工作效率。由于在此类场景中程序员会接受或拒绝建议,理想情况下,系统应利用这一反馈以实现上述目标。本研究利用程序员与Copilot交互的历史数据,开发了能够节省程序员时间的干预策略。我们提出了一种效用理论框架,该框架可建模与程序员的交互过程,并决定何时展示以及展示哪些建议。我们提出的基于人类反馈的条件性建议展示框架(CDHF)依赖于对程序员行为的预测模型。基于535名程序员的数据,我们构建了预测建议接受可能性的模型。在对真实编程任务(通过AI辅助编程完成)的回顾性评估中,我们发现CDHF能够实现有利的权衡。研究结果表明,在编程乃至可能的写作任务中,整合人类反馈可改善与大型语言模型的交互。