Autonomous software agents hold promise to increase developer productivity but make mistakes and exhibit novel failure modes, making human oversight central to successful human-agent collaboration. Existing research on agent oversight is largely conceptual; normative frameworks exist, but how users actually oversee agents is less known. In this paper, we bridge this gap by providing early empirical anchors for the theoretical discourse on agent oversight. Drawing on interviews with 17 experienced developers, we conduct an exploratory inquiry examining what forms of emergent oversight work developers perform, when, and how. We also document the oversight challenges developers face and the strategies they have started using to address them. We found at least four forms of emergent oversight work: a priori control, co-planning, real-time monitoring, and post hoc review. We show that oversight work is not only reactive and retrospective, as portrayed in existing research, but also preventative and proactive. We describe situated oversight challenges (e.g., difficulty reviewing agent-generated code) and outline heuristics developers adopt to address such challenges (e.g., using test results as guarantees for code correctness). We conclude with high-level takeaways, future research directions, implications for the human-centered design of software agents and for software engineering practice, and limitations of our research.
翻译:中文摘要:自主软件代理有望提升开发者生产力,但因其存在错误并展现出新型故障模式,使得人类监督成为人机成功协作的核心。现有关于代理监督的研究主要停留在概念层面:虽已存在规范性框架,但用户实际监督代理的方式尚不明确。本文通过为代理监督的理论探讨提供早期实证锚点来弥合这一研究空白。基于对17位资深开发者的访谈,我们开展探索性研究,考察开发者执行何种形式的涌现性监督工作、执行时机及方式。同时系统记录了开发者面临的监督挑战及其已开始采用的应对策略。研究发现至少四种涌现性监督工作形式:先验控制、协同规划、实时监控与事后审查。研究表明监督工作不仅具有现有研究描述的被动性与回顾性特征,更兼具预防性与主动性。我们描述了情境化监督挑战(如审查代理生成代码的困难),并概括了开发者应对此类挑战的启发法(如将测试结果作为代码正确性的保证)。最后提出高层次启示、未来研究方向、对以人为中心的软件代理设计及软件工程实践的启示,以及本研究存在的局限性。