Tool invocation significantly enhances the capabilities of Large Language Models (LLMs), yet challenges persist, particularly in complex task scenarios. Current methods, such as instruction-enhanced reasoning and supervised fine-tuning, often result in unnecessarily long reasoning paths and face difficulties in verifying the correctness of intermediate steps. In this paper, we propose CodeTool, a novel framework for stepwise code generation that improves LLM tool invocation by leveraging the concise and easily verifiable nature of code. CodeTool incorporates two distinct process rewards: the On-the-spot Reward, which provides immediate feedback on the accuracy of each tool invocation, and the Latent Reward, which assesses the contribution of each step toward overall task completion. By maximizing the cumulative reward of the On-the-spot and Latend Rewards at each step, LLMs are guided to follow efficient and accurate reasoning paths. Extensive experiments on StableToolBench and RestBench-TMDB demonstrate the superiority of CodeTool over existing approaches.
翻译:工具调用显著增强了大语言模型(LLM)的能力,但在复杂任务场景中仍存在挑战。当前方法,如指令增强推理和监督微调,通常导致不必要的冗长推理路径,且难以验证中间步骤的正确性。本文提出CodeTool,一种新颖的逐步代码生成框架,通过利用代码的简洁性和易验证性来改进LLM的工具调用。CodeTool引入了两种不同的过程奖励:即时奖励,为每次工具调用的准确性提供即时反馈;以及潜在奖励,评估每一步对整体任务完成的贡献。通过最大化每一步中即时奖励与潜在奖励的累积奖励,引导LLM遵循高效且准确的推理路径。在StableToolBench和RestBench-TMDB上的大量实验证明了CodeTool相较于现有方法的优越性。