Despite rapid progress in autonomous web agents, human involvement remains essential for shaping preferences and correcting agent behavior as tasks unfold. However, current agentic systems lack a principled understanding of when and why humans intervene, often proceeding autonomously past critical decision points or requesting unnecessary confirmation. In this work, we introduce the task of modeling human intervention to support collaborative web task execution. We collect CowCorpus, a dataset of 400 real-user web navigation trajectories containing over 4,200 interleaved human and agent actions. We identify four distinct patterns of user interaction with agents -- hands-off supervision, hands-on oversight, collaborative task-solving, and full user takeover. Leveraging these insights, we train language models (LMs) to anticipate when users are likely to intervene based on their interaction styles, yielding a 61.4-63.4% improvement in intervention prediction accuracy over base LMs. Finally, we deploy these intervention-aware models in live web navigation agents and evaluate them in a user study, finding a 36.8% increase in user-rated agent usefulness. Together, our results show structured modeling of human intervention leads to more adaptive, collaborative agents.
翻译:尽管自主网络代理技术发展迅速,人类在任务执行过程中仍不可或缺地需要设定偏好并纠正代理行为。然而当前代理系统缺乏对人类干预时机与原因的原则性理解,常常在关键决策节点自行其是,或请求不必要的用户确认。本研究提出人类干预建模任务以支持协作式网络任务执行。我们收集了包含400条真实用户浏览轨迹的CowCorpus数据集,其中包含4,200余个交错出现的人机动作。研究识别出用户与代理交互的四种典型模式——放手监督、亲手管控、协作求解与用户全权接管。基于这些发现,我们训练语言模型,使其能够根据用户交互风格预测干预倾向,干预预测准确率较基础语言模型提升61.4%-63.4%。最终我们将这些干预感知模型部署到实时网络导航代理中,并通过用户研究进行评估,发现用户对代理有用性的评价提升36.8%。综上,本研究证明结构化的人类干预建模能够催生更具适应性与协作性的代理系统。