Solar activity, including solar flares, coronal mass ejections (CMEs), and geomagnetic storms, can significantly impact satellites, aviation, power grids, data centers, and space missions. Extreme solar events can cause substantial economic damage with limited advance warning, underscoring the importance of early-warning systems, accurate forecasting, and effective education in space science. Although large language models (LLMs) perform well on general tasks, they often lack domain-specific knowledge and pedagogical capability to clearly explain complex space science concepts. We introduce SolarGPT-QA, a question answering system based on a domain-adapted large language model built on the LLaMA-3 base model. The model is trained using scientific literature and large-scale question-answer data generated with GPT-4 and refined using Grok-3 in a student-friendly storytelling style. Human pairwise evaluations show that SolarGPT-QA outperforms general-purpose models in zero-shot settings and achieves competitive performance compared to instruction-tuned models for educational explanations in space weather and heliophysics. A small pilot student comprehension study further suggests improved clarity and accessibility of the generated explanations. Ablation experiments indicate that combining domain-adaptive pretraining with pedagogical fine-tuning is important for balancing scientific accuracy and educational effectiveness. This work represents an initial step toward a broader SolarGPT framework for space science education and forecasting.
翻译:太阳活动(包括太阳耀斑、日冕物质抛射和地磁暴)会对卫星、航空、电网、数据中心及空间任务产生显著影响。极端太阳事件预警时间有限,可能造成重大经济损失,这凸显了空间科学领域早期预警系统、精准预报和有效教育的重要性。尽管大语言模型在通用任务上表现良好,但它们通常缺乏领域专业知识以及清晰解释复杂空间科学概念的教学能力。本文介绍了SolarGPT-QA——一个基于领域自适应大语言模型的问答系统,该模型以LLaMA-3基础模型为架构。模型通过科学文献以及由GPT-4生成并经Grok-3以学生友好的叙事风格优化的大规模问答数据进行训练。人工配对评估表明,在零样本设置下,SolarGPT-QA优于通用模型,并在空间天气与太阳物理的教育解释任务上取得了与指令微调模型相当的性能。一项小规模试点学生理解研究进一步表明,所生成解释的清晰度和可理解性得到提升。消融实验证明,结合领域自适应预训练与教学微调对于平衡科学准确性与教育效果至关重要。本工作为构建更广泛的面向空间科学教育与预报的SolarGPT框架迈出了初步步伐。