Solar activity, including solar flares, coronal mass ejections (CMEs), and geomagnetic storms, can significantly impact satellites, aviation, power grids, data centers, and space missions. Extreme solar events can cause substantial economic damage if not predicted in advance, highlighting the importance of accurate forecasting and effective education in space science. Although large language models (LLMs) perform well on general tasks, they often lack domain-specific knowledge and pedagogical capability to clearly explain complex space science concepts. We introduce SolarGPT-QA, a question answering system based on a domain-adapted large language model built on the LLaMA-3 base model. The model is trained using scientific literature and large-scale question-answer data generated with GPT-4 and refined using Grok-3 in a student-friendly storytelling style. Human pairwise evaluations show that SolarGPT-QA outperforms general-purpose models in zero-shot settings and achieves competitive performance compared to instruction-tuned models for educational explanations in space weather and heliophysics. A small pilot student comprehension study further suggests improved clarity and accessibility of the generated explanations. Ablation experiments indicate that combining domain-adaptive pretraining with pedagogical fine-tuning is important for balancing scientific accuracy and educational effectiveness. This work represents an initial step toward a broader SolarGPT framework for space science education and forecasting.
翻译:太阳活动,包括太阳耀斑、日冕物质抛射(CMEs)和地磁暴,能显著影响卫星、航空、电网、数据中心及太空任务。极端太阳事件若未能提前预测,可能造成重大经济损失,这凸显了空间科学中准确预报与有效教育的重要性。尽管大语言模型(LLMs)在通用任务上表现良好,但它们通常缺乏领域专业知识以及清晰解释复杂空间科学概念的教学能力。我们提出了SolarGPT-QA,这是一个基于领域自适应大语言模型的问答系统,该模型以LLaMA-3基础模型构建。模型使用科学文献以及由GPT-4生成、并经Grok-3以学生友好的叙事风格精化的大规模问答数据进行训练。人工成对评估表明,SolarGPT-QA在零样本设定下优于通用模型,并在空间天气与太阳物理的教育解释任务上,与经过指令微调的模型相比取得了有竞争力的性能。一项小规模试点学生理解研究进一步表明,所生成解释的清晰度和可理解性有所提升。消融实验显示,结合领域自适应预训练与教学微调对于平衡科学准确性与教育效果至关重要。本工作是迈向更广泛的、用于空间科学教育与预报的SolarGPT框架的初步探索。