In response to the call for agent-based solutions that leverage the ever-increasing capabilities of the deep models' ecosystem, we introduce Hive -- a comprehensive solution for selecting appropriate models and subsequently planning a set of atomic actions to satisfy the end-users' instructions. Hive operates over sets of models and, upon receiving natural language instructions (i.e. user queries), schedules and executes explainable plans of atomic actions. These actions can involve one or more of the available models to achieve the overall task, while respecting end-users specific constraints. Notably, Hive handles tasks that involve multi-modal inputs and outputs, enabling it to handle complex, real-world queries. Our system is capable of planning complex chains of actions while guaranteeing explainability, using an LLM-based formal logic backbone empowered by PDDL operations. We introduce the MuSE benchmark in order to offer a comprehensive evaluation of the multi-modal capabilities of agent systems. Our findings show that our framework redefines the state-of-the-art for task selection, outperforming other competing systems that plan operations across multiple models while offering transparency guarantees while fully adhering to user constraints.
翻译:为响应基于智能体解决方案的需求,该方案旨在利用深度模型生态系统日益增长的能力,我们提出HIVE——一个用于选择合适模型并规划原子动作序列以满足终端用户指令的综合性解决方案。HIVE在模型集合上运行,接收自然语言指令(即用户查询)后,调度并执行可解释的原子动作计划。这些动作可调用一个或多个可用模型以完成整体任务,同时严格遵守终端用户的特定约束。值得注意的是,HIVE能够处理涉及多模态输入与输出的任务,从而应对复杂的现实世界查询。我们的系统采用基于LLM的形式逻辑框架,结合PDDL操作赋能,在保证可解释性的同时规划复杂的动作链。为全面评估智能体系统的多模态能力,我们提出了MuSE基准测试。研究结果表明,我们的框架在任务选择方面重新定义了技术前沿,其性能优于其他在多个模型间规划操作的系统,在完全遵循用户约束的同时提供了透明性保证。