Large Language Models (LLMs) have demonstrated remarkable potential in scientific research, particularly in chemistry-related tasks such as molecular design, reaction prediction, and property estimation. While tool-augmented LLMs have been introduced to enhance reasoning and computation in these domains, existing approaches suffer from tool invocation errors and lack effective collaboration among diverse tools, limiting their overall performance. To address these challenges, we propose ChemHTS (Chemical Hierarchical Tool Stacking), a novel method that optimizes tool invocation pathways through a hierarchical stacking strategy. ChemHTS consists of two key stages: tool self-stacking warmup and multi-layer decision optimization, enabling LLMs to refine tool usage dynamically. We evaluate ChemHTS across four classical chemistry tasks and demonstrate its superiority over strong baselines, including GPT-4o, DeepSeek-R1, and chemistry-specific models, including ChemDFM. Furthermore, we define four distinct tool-stacking behaviors to enhance interpretability, providing insights into the effectiveness of tool collaboration. Our dataset and code are publicly available at \url{https://github.com/Chang-pw/ChemHTS}.
翻译:大语言模型(LLMs)在科学研究中展现出巨大潜力,尤其在化学相关任务中,如分子设计、反应预测和性质估算。尽管已有工具增强型LLMs被引入以提升这些领域的推理与计算能力,但现有方法存在工具调用错误且缺乏多样工具间的有效协作,限制了其整体性能。为应对这些挑战,我们提出了ChemHTS(化学分层工具堆叠),一种通过分层堆叠策略优化工具调用路径的新方法。ChemHTS包含两个关键阶段:工具自堆叠预热与多层决策优化,使LLMs能够动态优化工具使用。我们在四个经典化学任务上评估ChemHTS,并证明其优于包括GPT-4o、DeepSeek-R1以及化学专用模型(如ChemDFM)在内的强基线模型。此外,我们定义了四种不同的工具堆叠行为以增强可解释性,从而深入揭示工具协作的有效机制。我们的数据集和代码已公开于 \url{https://github.com/Chang-pw/ChemHTS}。