LLaMP: Large Language Model Made Powerful for High-fidelity Materials Knowledge Retrieval and Distillation

Reducing hallucination of Large Language Models (LLMs) is imperative for use in the sciences where reproducibility is crucial. However, LLMs inherently lack long-term memory, making it a nontrivial, ad hoc, and inevitably biased task to fine-tune them on domain-specific literature and data. Here we introduce LLaMP, a multimodal retrieval-augmented generation (RAG) framework of multiple data-aware reasoning-and-acting (ReAct) agents that dynamically interact with computational and experimental data on Materials Project (MP). Without fine-tuning, LLaMP demonstrates an ability to comprehend and integrate various modalities of materials science concepts, fetch relevant data stores on the fly, process higher-order data (such as crystal structures and elastic tensors), and summarize multi-step procedures for solid-state synthesis. We show that LLaMP effectively corrects errors in GPT-3.5's intrinsic knowledge, reducing a 5.21% MAPE on frequently-documented bandgaps and a significant 1103.54% MAPE on formation energies -- errors that GPT-3.5 seems to derive from mixed data sources. Additionally, LLaMP substantially reduces the hallucinated volumetric strain in a diamond cubic silicon structure from 66.3% to 0. The proposed framework offers an intuitive and nearly hallucination-free approach to exploring materials informatics and establishes a pathway for knowledge distillation and fine-tuning other language models. We envision the framework as a valuable component for scientific hypotheses and a foundation for future autonomous laboratories where multiple LLM agents communicate and cooperate with robotics to drive material synthesis and chemical reactions without hard-coded human logic and intervention.

翻译：降低大语言模型（LLMs）的幻觉现象对于在注重可重复性的科学领域中的应用至关重要。然而，LLMs天生缺乏长期记忆，这使得在领域特定文献和数据上对其进行微调成为一项非平凡、临时性且不可避免存在偏见的工作。本文提出LLaMP——一种基于多模态检索增强生成（RAG）框架，包含多个具有数据感知能力的推理-行动（ReAct）智能体，能够与Materials Project（MP）上的计算和实验数据动态交互。无需微调，LLaMP即展现出理解与整合多种模态材料科学概念的能力：可实时获取相关数据存储、处理高阶数据（如晶体结构和弹性张量），并总结固态合成的多步骤流程。研究表明，LLaMP有效纠正了GPT-3.5内在知识中的错误：针对高频记录的带隙数据，平均绝对百分比误差（MAPE）降低5.21%；针对形成能数据，GPT-3.5因混合数据源导致的显著误差从1103.54%大幅下降。此外，LLaMP将金刚石立方硅结构中的虚构体积应变从66.3%降至0。该框架为探索材料信息学提供了一种直观且近乎无幻觉的路径，同时为知识蒸馏和其他语言模型的微调建立了通路。我们预期该框架将成为科学假设的有价值组件，并奠定未来自主实验室的基础——其中多个LLM智能体可与机器人通信协作，无需硬编码的人类逻辑与干预，即可驱动材料合成与化学反应。