Integrating large language models (LLMs) into personal assistants, like Xiao Ai and Blue Heart V, effectively enhances their ability to interact with humans, solve complex tasks, and manage IoT devices. Such assistants are also termed LLM-driven agents. Upon receiving user requests, the LLM-driven agent generates plans using an LLM, executes these plans through various tools, and then returns the response to the user. During this process, the latency for generating a plan with an LLM can reach tens of seconds, significantly degrading user experience. Real-world dataset analysis shows that about 30% of the requests received by LLM-driven agents are identical or similar, which allows the reuse of previously generated plans to reduce latency. However, it is difficult to accurately define the similarity between the request texts received by the LLM-driven agent through directly evaluating the original request texts. Moreover, the diverse expressions of natural language and the unstructured format of plan texts make implementing plan reuse challenging. To address these issues, we present and implement a plan reuse mechanism for LLM-driven agents called AgentReuse. AgentReuse leverages the similarities and differences among requests' semantics and uses intent classification to evaluate the similarities between requests and enable the reuse of plans. Experimental results based on a real-world dataset demonstrate that AgentReuse achieves a 93% effective plan reuse rate, an F1 score of 0.9718, and an accuracy of 0.9459 in evaluating request similarities, reducing latency by 93.12% compared with baselines without using the reuse mechanism.
翻译:将大型语言模型(LLMs)集成到个人助理(如小爱同学、蓝心V)中,能有效提升其与人类交互、解决复杂任务及管理物联网设备的能力。此类助理亦被称为LLM驱动智能体。当接收到用户请求时,LLM驱动智能体利用LLM生成执行计划,通过各类工具执行这些计划,随后将结果返回给用户。在此过程中,使用LLM生成计划的延迟可达数十秒,严重影响了用户体验。真实世界数据集分析表明,LLM驱动智能体接收的请求中约有30%是相同或相似的,这使得复用先前生成的计划以减少延迟成为可能。然而,直接通过评估原始请求文本来精确定义LLM驱动智能体所接收请求之间的相似性十分困难。此外,自然语言的多样表达方式以及计划文本的非结构化格式,使得计划复用的实现面临挑战。为解决这些问题,我们提出并实现了一种面向LLM驱动智能体的计划复用机制——AgentReuse。AgentReuse利用请求语义间的相似性与差异性,通过意图分类来评估请求间的相似度,从而实现计划的复用。基于真实世界数据集的实验结果表明,AgentReuse在评估请求相似性方面实现了93%的有效计划复用率,F1分数达到0.9718,准确率为0.9459,与未使用复用机制的基线方法相比,延迟降低了93.12%。