SAGE: Tool-Augmented LLM Task Solving Strategies in Scalable Multi-Agent Environments

Large language models (LLMs) have proven to work well in question-answering scenarios, but real-world applications often require access to tools for live information or actuation. For this, LLMs can be extended with tools, which are often defined in advance, also allowing for some fine-tuning for specific use cases. However, rapidly evolving software landscapes and individual services require the constant development and integration of new tools. Domain- or company-specific tools can greatly elevate the usefulness of an LLM, but such custom tools can be problematic to integrate, or the LLM may fail to reliably understand and use them. For this, we need strategies to define new tools and integrate them into the LLM dynamically, as well as robust and scalable zero-shot prompting methods that can make use of those tools in an efficient manner. In this paper, we present SAGE, a specialized conversational AI interface, based on the OPACA framework for tool discovery and execution. The integration with OPACA makes it easy to add new tools or services for the LLM to use, while SAGE itself presents rich extensibility and modularity. This not only provides the ability to seamlessly switch between different models (e.g. GPT, LLAMA), but also to add and select prompting methods, involving various setups of differently prompted agents for selecting and executing tools and evaluating the results. We implemented a number of task-solving strategies, making use of agentic concepts and prompting methods in various degrees of complexity, and evaluated those against a comprehensive set of benchmark services. The results are promising and highlight the distinct strengths and weaknesses of different task-solving strategies. Both SAGE and the OPACA framework, as well as the different benchmark services and results, are available as Open Source/Open Data on GitHub.

翻译：大型语言模型（LLM）在问答场景中已表现出优异性能，但实际应用通常需要借助工具获取实时信息或执行操作。为此，可通过工具对LLM进行扩展，这些工具通常需预先定义，并允许针对特定用例进行微调。然而，快速演进的软件生态与个体服务要求持续开发和集成新工具。领域或企业专用工具能显著提升LLM的实用性，但此类定制工具的集成可能存在问题，或LLM可能无法可靠理解与使用它们。为此，我们需要动态定义新工具并将其集成至LLM的策略，以及能够高效利用这些工具的稳健、可扩展的零样本提示方法。本文提出SAGE——基于OPACA工具发现与执行框架的专用对话式人工智能接口。与OPACA的集成使得为LLM添加新工具或服务变得简便，同时SAGE本身具备丰富的可扩展性与模块化特性。这不仅支持在不同模型（如GPT、LLAMA）间无缝切换，还能添加和选择提示方法，涉及通过不同提示配置的智能体来选择和执行工具并评估结果。我们实现了多种任务解决策略，在不同复杂程度上利用智能体概念与提示方法，并针对一组综合性基准服务进行了评估。结果展现出良好前景，凸显了不同任务解决策略的独特优势与局限。SAGE与OPACA框架，以及各类基准服务与评估结果，均已在GitHub上以开源/开放数据形式发布。