Fast and Forgettable: A Controlled Study of Novices' Performance, Learning, Workload, and Emotion in AI-Assisted and Human Pair Programming Paradigms

Code-generating Artificial Intelligence has gained popularity within both professional and educational programming settings over the past several years. While research and pedagogy are beginning to cope with this change, computing students are left to bear the unforeseen consequences of AI amidst a dearth of empirical evidence about its effects. Though pair programming between students is well studied and known to be beneficial to self-efficacy and academic achievement, it remains underutilized and further threatened by the proposition that AI can replace a human programming partner. In this paper, we present a controlled pair programming study with 22 participants who wrote Python code under time pressure in teams of two and individually with GitHub Copilot for 20 minutes each. They were incentivized by bonus compensation to balance performance with understanding and were retested individually on the programming tasks after a retention interval of one week. Subjective measures of workload and emotion as well as objective measures of performance and learning (retest performance) were collected. Results showed that participants performed significantly better with GitHub Copilot than their human teammate, and several dimensions of their workload were significantly reduced. However, the emotional effect of the human teammate was significantly more positive and arousing as compared to working with Copilot. Furthermore, there was a nonsignificant absolute retest performance reduction in the AI condition and a larger retest performance decrement in the AI condition. We recommend that educators strongly consider revisiting pair programming as an educational tool in addition to embracing modern AI.

翻译：近年来，代码生成型人工智能在专业和教育编程环境中日益普及。尽管研究和教学法开始应对这一变革，但由于缺乏关于其影响的实证证据，计算机科学专业的学生不得不承受人工智能带来的意外后果。尽管学生之间的结对编程已被充分研究，且已知对自我效能感和学业成就有益，但其应用仍不充分，并且因人工智能可以取代人类编程搭档的提法而进一步受到威胁。本文报告了一项受控结对编程研究，22名参与者在时间压力下分别以两人小组形式以及与GitHub Copilot单独配合编写Python代码，每次持续20分钟。他们通过额外奖励来平衡表现与理解，并在为期一周的保留间隔后对编程任务进行单独复测。我们收集了工作负荷和情绪的主观测量数据，以及表现和学习（复测表现）的客观测量数据。结果显示，参与者在与GitHub Copilot合作时的表现显著优于与人类搭档合作，且其工作负荷的多个维度显著降低。然而，与Copilot合作相比，人类搭档带来的情绪效应显著更积极、更令人兴奋。此外，在AI条件下，复测表现的绝对下降不显著，但复测表现降幅更大。我们建议教育工作者在拥抱现代人工智能的同时，应强烈考虑重新将结对编程作为一种教育工具。