Predicting Developer Acceptance of AI-Generated Code Suggestions

AI-assisted programming tools are widely adopted, yet their practical utility is often undermined by undesired suggestions that interrupt developer workflows and cause frustration. While existing research has explored developer-AI interactions when programming qualitatively, a significant gap remains in quantitative analysis of developers' acceptance of AI-generated code suggestions, partly because the necessary fine-grained interaction data is often proprietary. To bridge this gap, this paper conducts an empirical study using 66,329 industrial developer-AI interactions from a large technology company. We analyze features that are significantly different between accepted code suggestions and rejected ones. We find that accepted suggestions are characterized by significantly higher historical acceptance counts and ratios for both developers and projects, longer generation intervals, shorter preceding code context in the project, and older IDE versions. Based on these findings, we introduce CSAP (Code Suggestion Acceptance Prediction) to predict whether a developer will accept the code suggestion before it is displayed. Our evaluation of CSAP shows that it achieves the accuracy of 0.973 and 0.922 on imbalanced and balanced dataset respectively. Compared to a large language model baseline and an in-production industrial filter, CSAP relatively improves the accuracy by 12.6\% and 69.5\% on imbalanced dataset, and improves the accuracy by 87.0\% and 140.1\% on balanced dataset. Our results demonstrate that targeted personalization is a powerful approach for filtering out code suggestions with predicted rejection and reduce developer interruption. To the best of our knowledge, it is the first quantitative study of code suggestion acceptance on large-scale industrial data, and this work also sheds light on an important research direction of AI-assisted programming.

翻译：AI辅助编程工具已被广泛采用，但其实际效用常因不良建议而受损，这些建议会中断开发者工作流程并引发挫败感。尽管现有研究已从定性角度探讨了开发者与AI在编程时的交互，但对开发者接受AI生成代码建议的定量分析仍存在显著空白，部分原因是所需的细粒度交互数据通常具有专有性。为填补这一空白，本文利用来自一家大型科技公司的66,329条工业级开发者-AI交互数据开展实证研究。我们分析了被接受代码建议与被拒绝建议之间存在显著差异的特征。研究发现，被接受建议的特征包括：开发者和项目的历史接受次数与接受率显著更高、生成间隔更长、项目中前置代码上下文更短，以及IDE版本更旧。基于这些发现，我们提出了CSAP（代码建议接受度预测）模型，用于在代码建议显示前预测开发者是否会接受它。对CSAP的评估表明，其在非平衡数据集和平衡数据集上的准确率分别达到0.973和0.922。与大型语言模型基线及一个生产级工业过滤器相比，CSAP在非平衡数据集上将准确率相对提升了12.6%和69.5%，在平衡数据集上则提升了87.0%和140.1%。我们的结果表明，针对性个性化是过滤预测会被拒绝的代码建议、减少开发者中断的有效方法。据我们所知，这是首个基于大规模工业数据对代码建议接受度进行的定量研究，本工作也为AI辅助编程的重要研究方向提供了启示。