This paper studies the problem of solving complex chemistry problems with large language models (LLMs). Despite the extensive general knowledge in LLMs (such as GPT-4), they struggle with chemistry reasoning that requires faithful grounded reasoning with diverse chemical knowledge and an integrative understanding of chemical interactions. We propose InstructChem, a new structured reasoning approach that substantially boosts the LLMs' chemical reasoning capabilities. InstructChem explicitly decomposes the reasoning into three critical phrases, including chemical formulae generation by LLMs that offers the basis for subsequent grounded reasoning, step-by-step reasoning that makes multi-step derivations with the identified formulae for a preliminary answer, and iterative review-and-refinement that steers LLMs to progressively revise the previous phases for increasing confidence, leading to the final high-confidence answer. We conduct extensive experiments on four different chemistry challenges, including quantum chemistry, quantum mechanics, physical chemistry, and chemistry kinetics. Our approach significantly enhances GPT-4 on chemistry reasoning, yielding an 8% average absolute improvement and a 30% peak improvement. We further use the generated reasoning by GPT-4 to fine-tune smaller LMs (e.g., Vicuna) and observe strong improvement of the smaller LMs. This validates our approach and enables LLMs to generate high-quality reasoning.
翻译:本文研究了利用大型语言模型解决复杂化学问题的方法。尽管大型语言模型(如GPT-4)具备广泛通用知识,但在需要结合多样化化学知识进行可靠推理、并整合理解化学相互作用时,仍面临推理困难。我们提出InstructChem——一种新型结构化推理方法,可显著提升大型语言模型的化学推理能力。InstructChem将推理过程明确分解为三个关键阶段:首先由大型语言模型生成化学式,为后续可靠推理奠定基础;其次进行逐步推理,基于已识别的化学式进行多步骤推导以获得初步答案;最后通过迭代审查与优化,引导大型语言模型渐进修正前序阶段以提升可信度,最终输出高置信度答案。我们在量子化学、量子力学、物理化学及化学动力学四个不同化学挑战任务上开展广泛实验。该方法使GPT-4的化学推理性能平均绝对提升8%,峰值提升达30%。我们进一步利用GPT-4生成的推理结果微调小型语言模型(如Vicuna),观察到小型模型性能显著增强,这验证了本方法的有效性,并使大型语言模型能够生成高质量推理结果。