Current language model-driven agents often lack mechanisms for effective user participation, which is crucial given the vagueness commonly found in user instructions. Although adept at devising strategies and performing tasks, these agents struggle with seeking clarification and grasping precise user intentions. To bridge this gap, we introduce Intention-in-Interaction (IN3), a novel benchmark designed to inspect users' implicit intentions through explicit queries. Next, we propose the incorporation of model experts as the upstream in agent designs to enhance user-agent interaction. Employing IN3, we empirically train Mistral-Interact, a powerful model that proactively assesses task vagueness, inquires user intentions, and refines them into actionable goals before starting downstream agent task execution. Integrating it into the XAgent framework, we comprehensively evaluate the enhanced agent system regarding user instruction understanding and execution, revealing that our approach notably excels at identifying vague user tasks, recovering and summarizing critical missing information, setting precise and necessary agent execution goals, and minimizing redundant tool usage, thus boosting overall efficiency. All the data and codes are released.
翻译:当前语言模型驱动的代理往往缺乏有效的用户参与机制,而考虑到用户指令中普遍存在的模糊性,这一机制至关重要。尽管这些代理擅长制定策略和执行任务,但它们难以主动寻求澄清并准确把握用户的精确意图。为弥补这一缺陷,我们提出了Intention-in-Interaction(IN3),一个旨在通过显式查询来洞察用户隐式意图的新型基准。接下来,我们建议在代理设计中引入模型专家作为上游组件,以增强用户与代理之间的交互。利用IN3,我们实证训练了Mistral-Interact,一个强大的模型,能够主动评估任务模糊性、询问用户意图,并在启动下游代理任务执行前将其提炼为可操作的目标。通过将其集成到XAgent框架中,我们从用户指令理解与执行两个维度全面评估了增强后的代理系统。结果表明,我们的方法在识别模糊用户任务、恢复并总结关键缺失信息、设定精确且必要的代理执行目标、以及减少冗余工具使用方面表现出色,从而显著提升了整体效率。所有数据和代码均已开源。