AI powered code-recommendation systems, such as Copilot and CodeWhisperer, provide code suggestions inside a programmer's environment (e.g., an IDE) with the aim to improve their productivity. Since, in these scenarios, programmers accept and reject suggestions, ideally, such a system should use this feedback in furtherance of this goal. In this work, we leverage prior data of programmers interacting with GitHub Copilot, a system used by millions of programmers, to develop interventions that can save programmer time. We propose a utility theory framework, which models this interaction with programmers and decides which suggestions to display. Our framework Conditional suggestion Display from Human Feedback (CDHF), relies on a cascade of models that predict suggestion acceptance to selectively hide suggestions reducing both latency and programmer verification time. Using data from 535 programmers, we perform a retrospective evaluation of CDHF and show that we can avoid displaying a significant fraction of suggestions that would have been rejected doing so without total knowledge of the suggestions themselves. We further demonstrate the importance of incorporating the programmer's latent unobserved state in deciding when to display suggestions through ablations on user study data. Finally, we showcase that using suggestion acceptance as a reward signal to know which suggestions to display leads to reduced quality suggestions indicating an unexpected pitfall.
翻译:由人工智能驱动的代码推荐系统(如Copilot和CodeWhisperer)能够在程序员的工作环境(例如集成开发环境IDE)中提供代码建议,旨在提升其工作效率。由于在这些场景中,程序员会接受或拒绝建议,理想情况下,系统应利用这些反馈来进一步实现目标。本研究借助程序员与GitHub Copilot(一款被数百万程序员使用的系统)交互的历史数据,开发出能够节省程序员时间的干预措施。我们提出了一种效用理论框架,该框架对程序员交互行为进行建模,并决定显示哪些建议。我们的“基于人类反馈的条件性建议显示”(CDHF)框架依赖一系列用于预测建议接受情况的级联模型,有选择性地隐藏建议,从而减少延迟和程序员验证时间。基于535名程序员的数据,我们对CDHF进行了回顾性评估,结果表明,即使在未完全了解建议内容的情况下,我们也能避免显示本将被拒绝的大部分建议。进一步通过对用户研究数据的消融实验,我们证明了在决定何时显示建议时,融入程序员潜在的未观测状态的重要性。最后,我们揭示将建议接受度作为奖励信号以指导建议显示,反而会导致建议质量下降,这揭示了此前未被预料到的陷阱。