Driven by curiosity, humans have continually sought to explore and understand the world around them, leading to the invention of various tools to satiate this inquisitiveness. Despite not having the capacity to process and memorize vast amounts of information in their brains, humans excel in critical thinking, planning, reflection, and harnessing available tools to interact with and interpret the world, enabling them to find answers efficiently. The recent advancements in large language models (LLMs) suggest that machines might also possess the aforementioned human-like capabilities, allowing them to exhibit powerful abilities even with a constrained parameter count. In this paper, we introduce KwaiAgents, a generalized information-seeking agent system based on LLMs. Within KwaiAgents, we propose an agent system that employs LLMs as its cognitive core, which is capable of understanding a user's query, behavior guidelines, and referencing external documents. The agent can also update and retrieve information from its internal memory, plan and execute actions using a time-aware search-browse toolkit, and ultimately provide a comprehensive response. We further investigate the system's performance when powered by LLMs less advanced than GPT-4, and introduce the Meta-Agent Tuning (MAT) framework, designed to ensure even an open-sourced 7B or 13B model performs well among many agent systems. We exploit both benchmark and human evaluations to systematically validate these capabilities. Extensive experiments show the superiority of our agent system compared to other autonomous agents and highlight the enhanced generalized agent-abilities of our fine-tuned LLMs.
翻译:受好奇心驱动,人类不断探索并理解周围世界,发明了各种工具来满足这种求知欲。尽管人脑无法处理和记忆海量信息,但人类擅长批判性思维、规划、反思以及利用可用工具与外界互动并解读世界,从而高效地找到答案。近期大语言模型(LLM)的进展表明,机器也可能具备上述类似人类的能力,即使在参数规模受限的情况下也能展现强大性能。本文提出KwaiAgents——一个基于大语言模型的广义信息检索智能体系统。在该系统中,我们设计了一个以LLM为认知核心的智能体架构,能够理解用户查询、行为准则并引用外部文档。该智能体还可从内部记忆更新和检索信息,通过时间感知的搜索浏览工具集规划和执行动作,最终提供综合性回答。我们进一步研究了系统在低于GPT-4性能的LLM驱动下的表现,并引入元智能体微调(MAT)框架,旨在确保即使是开源的7B或13B模型也能在众多智能体系统中表现良好。我们通过基准测试和人工评估系统性地验证了这些能力。大量实验表明,我们的智能体系统优于其他自主智能体,同时突显了微调后LLM在通用智能体能力上的显著增强。