AI approaches are progressing besting humans at game-related tasks (e.g. chess). The next stage is expected to be Human-AI collaboration; however, the research on this subject has been mixed and is in need of additional data points. We add to this nascent literature by studying Human-AI collaboration on a common administrative educational task. Education is a special domain in its relation to AI and has been slow to adopt AI approaches in practice, concerned with the educational enterprise losing its humanistic touch and because standard of quality is demanded because of the impact on a person's career and developmental trajectory. In this study (N = 22), we design an experiment to explore the effect of Human-AI collaboration on the task of tagging educational content with skills from the US common core taxonomy. Our results show that the experiment group (with AI recommendations) saved around 50% time (p < 0.01) in the execution of their tagging task but at the sacrifice of 7.7% recall (p = 0.267) and 35% accuracy (p= 0.1170) compared with the non-AI involved control group, placing the AI+human group in between the AI alone (lowest performance) and the human alone (highest performance). We further analyze log data from this AI collaboration experiment to explore under what circumstances humans still exercised their discernment when receiving recommendations. Finally, we outline how this study can assist in implementing AI tools, like ChatGPT, in education.
翻译:人工智能方法在游戏相关任务(如国际象棋)中已逐渐超越人类。下一阶段预计将是人机协作;然而,针对该主题的研究结果参差不齐,亟需更多数据支持。我们通过研究常见教育管理任务中的人机协作,为这一新兴文献领域做出贡献。教育因其与人工智能的特殊关系,在实践中采用AI方法的进展缓慢——这既源于对教育丧失人文关怀的担忧,也因教育质量标准的严格要求(直接影响个人职业与发展轨迹)。在本研究(N=22)中,我们设计实验探究人机协作对美国共同核心标准分类法中的教育内容技能标记任务的影响。结果表明,与无AI参与的对照组相比,实验组(含AI建议)完成任务时间节省约50%(p<0.01),但召回率降低7.7%(p=0.267),准确率降低35%(p=0.1170),使人机协作组的性能介于纯AI组(最低)与纯人工组(最高)之间。我们进一步分析该协作实验的日志数据,探究人类在接收建议时仍运用判断力的具体情境。最后,我们阐述本研究如何助力ChatGPT等AI工具在教育领域的应用实施。