Driven by curiosity, humans have continually sought to explore and understand the world around them, leading to the invention of various tools to satiate this inquisitiveness. Despite not having the capacity to process and memorize vast amounts of information in their brains, humans excel in critical thinking, planning, reflection, and harnessing available tools to interact with and interpret the world, enabling them to find answers efficiently. The recent advancements in large language models (LLMs) suggest that machines might also possess the aforementioned human-like capabilities, allowing them to exhibit powerful abilities even with a constrained parameter count. In this paper, we introduce KwaiAgents, a generalized information-seeking agent system based on LLMs. Within KwaiAgents, we propose an agent system that employs LLMs as its cognitive core, which is capable of understanding a user's query, behavior guidelines, and referencing external documents. The agent can also update and retrieve information from its internal memory, plan and execute actions using a time-aware search-browse toolkit, and ultimately provide a comprehensive response. We further investigate the system's performance when powered by LLMs less advanced than GPT-4, and introduce the Meta-Agent Tuning (MAT) framework, designed to ensure even an open-sourced 7B or 13B model performs well among many agent systems. We exploit both benchmark and human evaluations to systematically validate these capabilities. Extensive experiments show the superiority of our agent system compared to other autonomous agents and highlight the enhanced generalized agent-abilities of our fine-tuned LLMs.
翻译:受好奇心驱动,人类不断探索和理解周围的世界,并发明各种工具来满足这种求知欲。尽管人脑无法处理和记忆海量信息,但人类擅长批判性思维、规划、反思以及利用现有工具与世界互动和解读,从而高效寻找答案。大语言模型的最新进展表明,机器或许也已具备上述类人能力,即便参数规模受限也能展现出强大性能。本文提出KwaiAgents——一个基于大语言模型的通用信息检索智能体系统。在该系统中,我们设计了一种以LLM为认知核心的智能体架构,能够理解用户查询、行为准则并参考外部文档。该智能体还可从其内部记忆模块中更新和检索信息,通过时间感知的搜索-浏览工具集规划并执行行动,最终生成全面响应。我们进一步研究了在性能低于GPT-4的LLM驱动下系统的表现,并引入元智能体调优框架,确保即使是开源7B或13B模型也能在众多智能体系统中表现优异。我们通过基准测试和人工评估系统性地验证了这些能力。大量实验表明,我们的智能体系统优于其他自主智能体,且经过微调的LLM展现出增强的通用智能体能力。