Octo-planner: On-device Language Model for Planner-Action Agents

AI agents have become increasingly significant in various domains, enabling autonomous decision-making and problem-solving. To function effectively, these agents require a planning process that determines the best course of action and then executes the planned actions. In this paper, we present an efficient on-device Planner-Action framework that separates planning and action execution into two distinct components: a planner agent based on Phi-3 Mini, a 3.8 billion parameter LLM optimized for edge devices, and an action agent using the Octopus model for function execution. The planner agent first responds to user queries by decomposing tasks into a sequence of sub-steps, which are then executed by the action agent. To optimize performance on resource-constrained devices, we employ model fine-tuning instead of in-context learning, reducing computational costs and energy consumption while improving response times. Our approach involves using GPT-4 to generate diverse planning queries and responses based on available functions, with subsequent validations to ensure data quality. We fine-tune the Phi-3 Mini model on this curated dataset, achieving a 97\% success rate in our in-domain test environment. To address multi-domain planning challenges, we developed a multi-LoRA training method that merges weights from LoRAs trained on distinct function subsets. This approach enables flexible handling of complex, multi-domain queries while maintaining computational efficiency on resource-constrained devices. To support further research, we have open-sourced our model weights at \url{https://huggingface.co/NexaAIDev/octopus-planning}. For the demo, please refer to \url{https://www.nexa4ai.com/octo-planner}.

翻译：人工智能智能体在各领域的重要性日益凸显，其能够实现自主决策与问题求解。为有效运作，这类智能体需通过规划过程确定最优行动方案并执行规划动作。本文提出一种高效的端侧规划-执行框架，将规划与动作执行分离为两个独立组件：基于Phi-3 Mini（专为边缘设备优化的38亿参数大语言模型）的规划智能体，以及使用Octopus模型执行功能的动作智能体。规划智能体首先响应用户查询，将任务分解为子步骤序列，随后由动作智能体执行。为优化资源受限设备的性能，我们采用模型微调而非上下文学习，在提升响应速度的同时降低计算成本与能耗。该方法利用GPT-4基于可用函数生成多样化的规划查询与响应，并通过后续验证确保数据质量。我们在精选数据集上对Phi-3 Mini进行微调，在领域内测试环境中达到97%的成功率。针对多领域规划挑战，我们开发了多LoRA训练方法，通过融合在不同功能子集上训练的LoRA权重，实现在资源受限设备上高效处理复杂多领域查询的同时保持计算效率。为促进后续研究，我们已在\url{https://huggingface.co/NexaAIDev/octopus-planning}开源模型权重。演示系统请访问\url{https://www.nexa4ai.com/octo-planner}。