Agentic AI and Human-in-the-Loop Interventions: Field Experimental Evidence from Alibaba's Customer Service Operations

Agentic AI systems that autonomously perform service tasks are entering customer service operations. However, limited evidence exists on how human interventions shape service outcomes when agentic AI failures create both cognitive and emotional consequences. We study this issue through a randomized field experiment on Alibaba's Taobao platform. Workers in the treatment condition supervised an agentic AI system that resolved AI-eligible chats while continuing to handle AI-ineligible chats, whereas control workers resolved all chats without agentic AI. The findings show that AI deployment reduces average chat duration and has limited effects on retrial rates, but substantially lowers ratings for AI-eligible chats. Moreover, human intervention effectiveness in AI-eligible chats depends on the nature of AI failure, post-escalation intervention effort, and intervention timing. Human intervention preserves service quality in algorithm-triggered technical escalations, i.e., unresolved customer issues beyond the AI's capability, but is less effective in algorithm-triggered emotional escalations, i.e., where customers express frustration or dissatisfaction. These differences are partly explained by variation in workers' post-escalation intervention effort across escalation types. In algorithm-triggered emotional escalations, workers showed lower engagement: they sent fewer messages, contributed a smaller share of total chat rounds, and showed less proactivity in information seeking and solution provision. We further find that early intervention is essential for sustaining high post-escalation intervention effort. Finally, we document a positive spillover effect on AI-ineligible chats, as treated workers adapted their multitasking workflow to devote greater attention to these chats. These findings offer implications for human-in-the-loop process design in human-AI collaboration systems.

翻译：可自主执行服务任务的人工智能体系统正进入客服运营领域。然而，当人工智能体失败引发认知与情感双重后果时，人类干预如何影响服务结果，现有证据仍十分有限。我们通过在阿里巴巴淘宝平台开展随机现场实验研究这一问题。实验组工作者监督人工智能体系统：系统处理符合AI条件的对话，而实验组工作者继续处理不符合AI条件的对话；对照组工作者则处理所有对话（无AI辅助）。研究结果表明：AI部署缩短了平均对话时长，对重试率影响有限，但显著降低了AI条件对话的评分。此外，人类在AI条件对话中的干预效果取决于AI失败的性质、升级后干预努力及干预时机。在算法触发的技术升级（即AI能力无法解决的客户问题）中，人类干预可维持服务质量；但在算法触发的情感升级（即客户表达挫败或不满）中，干预效果较弱。这些差异部分源于工作者在升级后的干预努力因升级类型而异：在算法触发的情感升级中，工作者参与度降低——发送消息减少、占总对话轮次比例下降、在信息寻求与方案提供方面的主动性减弱。我们进一步发现，早期干预对于维持升级后高干预努力至关重要。最后，我们观察到对不符合AI条件对话的正向溢出效应：实验组工作者调整其多任务工作流以投入更多关注。这些发现为人机协作系统中的人机协同流程设计提供了启示。

相关内容

关注 7110

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

AI智能体面临的威胁：关键安全挑战与未来路径综述

专知会员服务

53+阅读 · 2024年6月7日

推荐！《人与AI协作中的可解释人工智能》320页论文

专知会员服务

138+阅读 · 2023年7月31日

阿里巴巴与中国通信院联合发布《人工智能治理与可持续发展实践白皮书》107页

专知会员服务

32+阅读 · 2022年9月5日

最新综述论文《人类与人工智能交互中的信任：确定模型、措施和方法》东京工业大学

专知会员服务

30+阅读 · 2022年6月16日