PACuna: Automated Fine-Tuning of Language Models for Particle Accelerators

Navigating the landscape of particle accelerators has become increasingly challenging with recent surges in contributions. These intricate devices challenge comprehension, even within individual facilities. To address this, we introduce PACuna, a fine-tuned language model refined through publicly available accelerator resources like conferences, pre-prints, and books. We automated data collection and question generation to minimize expert involvement and make the data publicly available. PACuna demonstrates proficiency in addressing intricate accelerator questions, validated by experts. Our approach shows adapting language models to scientific domains by fine-tuning technical texts and auto-generated corpora capturing the latest developments can further produce pre-trained models to answer some intricate questions that commercially available assistants cannot and can serve as intelligent assistants for individual facilities.

翻译：随着近期研究贡献的激增，在粒子加速器领域中进行导航变得愈发具有挑战性。这些精密设备即便在单个设施内部也难以全面理解。为解决这一问题，我们提出了PACuna——一种通过公开可获取的加速器资源（如会议论文、预印本和专著）进行微调的语言模型。我们实现了数据收集与问题生成的自动化，以最大限度减少专家参与，并向公众开放了相关数据。经专家验证，PACuna在处理复杂的加速器问题方面展现出专业能力。本研究表明，通过技术文本微调以及捕捉最新进展的自动生成语料库，将语言模型适配至科学领域，能够进一步催生出可回答商用助手无法处理的复杂问题的预训练模型，并可作为单个设施的智能助手。

相关内容

Automator

关注 5

Automator是苹果公司为他们的Mac OS X系统开发的一款软件。 只要通过点击拖拽鼠标等操作就可以将一系列动作组合成一个工作流，从而帮助你自动的（可重复的）完成一些复杂的工作。Automator还能横跨很多不同种类的程序，包括：查找器、Safari网络浏览器、iCal、地址簿或者其他的一些程序。它还能和一些第三方的程序一起工作，如微软的Office、Adobe公司的Photoshop或者Pixelmator等。

O’Reilly报告：知识图谱崛起——面向现代数据集成和数据结构体系，“The Rise of the Knowledge Graph——Toward Modern Data Integration and the Data Fabric Architecture”

专知会员服务

49+阅读 · 2022年2月18日

Linux导论，Introduction to Linux，96页ppt

专知会员服务

82+阅读 · 2020年7月26日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日