MechGPT, a language-based strategy for mechanics and materials modeling that connects knowledge across scales, disciplines and modalities

For centuries, researchers have sought out ways to connect disparate areas of knowledge. While early scholars (Galileo, da Vinci, etc.) were experts across fields, specialization has taken hold later. With the advent of Artificial Intelligence, we can now explore relationships across areas (e.g., mechanics-biology) or disparate domains (e.g., failure mechanics-art). To achieve this, we use a fine-tuned Large Language Model (LLM), here for a subset of knowledge in multiscale materials failure. The approach includes the use of a general-purpose LLM to distill question-answer pairs from raw sources followed by LLM fine-tuning. The resulting MechGPT LLM foundation model is used in a series of computational experiments to explore its capacity for knowledge retrieval, various language tasks, hypothesis generation, and connecting knowledge across disparate areas. While the model has some ability to recall knowledge from training, we find that LLMs are particularly useful to extract structural insights through Ontological Knowledge Graphs. These interpretable graph structures provide explanatory insights, frameworks for new research questions, and visual representations of knowledge that also can be used in retrieval-augmented generation. Three versions of MechGPT are discussed, featuring different sizes from 13 billion to 70 billion parameters, and reaching context lengths of more than 10,000 tokens. This provides ample capacity for sophisticated retrieval augmented strategies, as well as agent-based modeling where multiple LLMs interact collaboratively and/or adversarially, the incorporation of new data from the literature or web searches, as well as multimodality.

翻译：几个世纪以来，研究者不断探索连接不同知识领域的方法。尽管早期学者（如伽利略、达·芬奇等）已是跨领域专家，但后来专业化趋势逐渐占据主导。随着人工智能的出现，我们现在能够探索不同领域（如力学-生物学）乃至迥异领域（如断裂力学-艺术）之间的关系。为实现这一目标，我们采用了一个微调的大语言模型，专门针对多尺度材料失效领域中的部分知识。该方法包括：使用通用大语言模型从原始文献中提炼问答对，随后进行模型微调。由此产生的MechGPT大语言模型基础模型被用于一系列计算实验，以探究其在知识检索、各类语言任务、假设生成以及跨领域知识连接方面的能力。尽管该模型具备一定从训练数据中回忆知识的能力，但我们发现，大语言模型在通过本体知识图谱提取结构化见解方面尤为有效。这些可解释的图结构不仅提供了阐释性洞见、为新的研究问题搭建框架，还能生成知识的可视化表征，并可用于检索增强生成。本文讨论了三个MechGPT版本，参数量从130亿到700亿不等，上下文长度可达10,000余个token，这为复杂的检索增强策略、多智能体协作与对抗建模、从文献或网络搜索中整合新数据，以及多模态处理提供了充足的能力。