The dominant paradigm for AI agents is an "on-the-fly" loop in which agents synthesize plans and execute actions within seconds or minutes in response to user prompts. We argue that this paradigm short-circuits disciplined software engineering (SE) processes -- iterative design, rigorous testing, adversarial evaluation, staged deployment, and more -- that have delivered the (relatively) reliable and secure systems we use today. By focusing on rapid, real-time synthesis, are AI agents effectively delivering users improvised prototypes rather than systems fit for high-stakes scenarios in which users may unwittingly apply them? This paper argues for the need to integrate rigorous SE processes into the agentic loop to produce production-grade, hardened, and deterministically-constrained agent *workflows* that substantially outperform the potentially brittle and vulnerable results of on-the-fly synthesis. Doing so may require extra compute and time, and if so, we must amortize the cost of rigor through reuse across a broad user community. We envision an *AI Workflow Store* that consists of hardened and reusable workflows that agents can invoke with far greater reliability and security than improvised tool chains. We outline the research challenges of this vision, which stem from a broader flexibility-robustness tension that we argue requires moving beyond the ``on-the-fly'' paradigm to navigate effectively.
翻译:当前AI智能体的主流范式是一种“即时”循环模式,即智能体根据用户提示在数秒或数分钟内综合规划并执行行动。我们认为这种范式绕开了严谨的软件工程(SE)流程——包括迭代设计、严格测试、对抗性评估、分阶段部署等——而正是这些流程为我们带来了当代相对可靠和安全的系统。专注于快速实时综合的AI智能体,是否事实上在向用户交付即兴原型,而非适用于用户可能无知使用的高风险场景的系统?本文主张有必要将严格软件工程流程整合到智能体循环中,以生成生产级、经强化且受确定性约束的智能体*工作流*,其性能远超可能脆弱且易受攻击的即时综合结果。实现这一目标可能需要额外算力与时间,若有此必要,我们必须通过跨广泛用户社区的重用来分摊严谨性成本。我们构想了一个*AI工作流商店*,其中包含经强化且可复用的工作流,智能体在调用它们时能获得比临时工具链远为可靠的可靠性保障。我们概述了这一愿景的研究挑战——其根源在于更广泛的灵活性与鲁棒性之间的张力,我们认为,有效驾驭这种张力需要超越“即时”范式。