In orchestrated multi-agent systems, humans often struggle to manage plans due to their complexity and limited transparency. Existing approaches rely on outcome-level supervision, where users verify only final outputs without visibility into intermediate reasoning. We formalize a design space for human-LLM co-planning interactions along three axes: mode (semantic vs. structural), scope (global vs. targeted), and level (low vs. high-level edits). We realize it in AMBIPOM, a prototype supporting process-level supervision through both semantic and structural interactions. Through a user study, we characterize how users navigate this space, revealing hybrid workflows and effort-control-risk trade-offs; through a controlled benchmark, we analyze how LLMs revise plans under varying scope and revision strategies. Our findings yield design insights for more transparent, controllable, and effective human-AI co-planning. We release code and data at https://github.com/megagonlabs/ambipom.
翻译:在编排式多智能体系统中,人类常因规划的复杂性与有限透明度而难以有效管控。现有方法依赖结果级监督——用户仅能验证最终输出,无法洞察中间推理过程。我们沿三个维度形式化了人-大语言模型协同规划交互的设计空间:模式(语义型vs.结构型)、范围(全局型vs.针对型)与层级(低层级vs.高层级编辑)。我们通过原型系统AMBIPOM实现了这一空间,该系统通过语义与结构双重交互支持过程级监督。通过用户研究,我们刻画了用户如何驾驭该空间的行为特征,揭示了混合式工作流及努力-控制-风险权衡;通过受控基准实验,我们分析了不同范围与修订策略下大语言模型修订规划的机制。研究结论为构建更透明、可控且高效的人机协同规划体系提供了设计洞见。相关代码与数据已发布于https://github.com/megagonlabs/ambipom。