The convergence of artificial intelligence and materials science presents a transformative opportunity, but achieving true acceleration in discovery requires moving beyond task-isolated, fine-tuned models toward agentic systems that plan, act, and learn across the full discovery loop. This survey advances a unique pipeline-centric view that spans from corpus curation and pretraining, through domain adaptation and instruction tuning, to goal-conditioned agents interfacing with simulation and experimental platforms. Unlike prior reviews, we treat the entire process as an end-to-end system to be optimized for tangible discovery outcomes rather than proxy benchmarks. This perspective allows us to trace how upstream design choices-such as data curation and training objectives-can be aligned with downstream experimental success through effective credit assignment. To bridge communities and establish a shared frame of reference, we first present an integrated lens that aligns terminology, evaluation, and workflow stages across AI and materials science. We then analyze the field through two focused lenses: From the AI perspective, the survey details LLM strengths in pattern recognition, predictive analytics, and natural language processing for literature mining, materials characterization, and property prediction; from the materials science perspective, it highlights applications in materials design, process optimization, and the acceleration of computational workflows via integration with external tools (e.g., DFT, robotic labs). Finally, we contrast passive, reactive approaches with agentic design, cataloging current contributions while motivating systems that pursue long-horizon goals with autonomy, memory, and tool use. This survey charts a practical roadmap towards autonomous, safety-aware LLM agents aimed at discovering novel and useful materials.
翻译:人工智能与材料科学的融合带来了变革性机遇,但要在发现过程中实现真正的加速,需要超越任务孤立、微调的模型,转向能够在完整发现循环中进行规划、行动和学习的智能体系统。本综述提出了一种独特的以流程为中心的观点,涵盖从语料库构建与预训练、领域适应与指令微调,到与模拟及实验平台交互的目标条件智能体。与以往综述不同,我们将整个过程视为一个端到端系统,旨在优化实际发现成果而非代理基准。这一视角使我们能够追溯上游设计选择(如数据构建和训练目标)如何通过有效的信用分配与下游实验成功对齐。为连接不同领域并建立共同的参考框架,我们首先提出一个整合视角,以对齐人工智能与材料科学在术语、评估和工作流程阶段的理解。随后,我们通过两个聚焦视角分析该领域:从人工智能视角,本综述详述了大型语言模型在模式识别、预测分析和自然语言处理方面的优势,及其在文献挖掘、材料表征和性能预测中的应用;从材料科学视角,则重点介绍了在材料设计、工艺优化以及通过与外部工具(如密度泛函理论、机器人实验室)集成加速计算工作流程等方面的应用。最后,我们对比了被动反应式方法与智能体设计,梳理了当前贡献,同时激励开发具备自主性、记忆和工具使用能力、追求长远目标的系统。本综述为旨在发现新颖实用材料的自主、安全感知的大型语言模型智能体绘制了一条切实可行的路线图。