We present specialized Large Language Models for theoretical High-Energy Physics, obtained as 20 fine-tuned variants of the 8-billion parameter Llama-3.1 model. Each variant was trained on arXiv abstracts (through August 2024) from different combinations of hep-th, hep-ph and gr-qc. For a comparative study, we also trained models on datasets that contained abstracts from disparate fields such as the q-bio and cs categories. All models were fine-tuned using two distinct Low-Rank Adaptation fine-tuning approaches and varying dataset sizes, and outperformed the base model on hep-th abstract completion tasks. We compare performance against leading commercial LLMs (ChatGPT, Claude, Gemini, DeepSeek) and derive insights for further developing specialized language models for High-Energy Theoretical Physics.
翻译:我们提出了专门针对理论高能物理的大语言模型,该模型是通过对80亿参数的Llama-3.1模型进行20种不同微调变体而获得的。每个变体均在截至2024年8月的arXiv摘要上进行训练,这些摘要来自hep-th、hep-ph和gr-qc的不同组合。为了进行对比研究,我们还使用包含q-bio和cs等不同领域摘要的数据集训练了模型。所有模型均采用两种不同的低秩自适应微调方法及不同规模的数据集进行微调,并在hep-th摘要补全任务上超越了基础模型。我们将其性能与领先的商业大语言模型(ChatGPT、Claude、Gemini、DeepSeek)进行了比较,并为进一步开发面向高能理论物理的专用语言模型提供了见解。