The increasing demand for spatiotemporal data and modeling tasks in geosciences has made geospatial code generation technology a critical factor in enhancing productivity. Although large language models (LLMs) have demonstrated potential in code generation tasks, they often encounter issues such as refusal to code or hallucination in geospatial code generation due to a lack of domain-specific knowledge and code corpora. To address these challenges, this paper presents and open-sources the GeoCode-PT and GeoCode-SFT corpora, along with the GeoCode-Eval evaluation dataset. Additionally, by leveraging QLoRA and LoRA for pretraining and fine-tuning, we introduce GeoCode-GPT-7B, the first LLM focused on geospatial code generation, fine-tuned from Code Llama-7B. Furthermore, we establish a comprehensive geospatial code evaluation framework, incorporating option matching, expert validation, and prompt engineering scoring for LLMs, and systematically evaluate GeoCode-GPT-7B using the GeoCode-Eval dataset. Experimental results show that GeoCode-GPT outperforms other models in multiple-choice accuracy by 9.1% to 32.1%, in code summarization ability by 1.7% to 25.4%, and in code generation capability by 1.2% to 25.1%. This paper provides a solution and empirical validation for enhancing LLMs' performance in geospatial code generation, extends the boundaries of domain-specific model applications, and offers valuable insights into unlocking their potential in geospatial code generation.
翻译:地球科学领域对时空数据与建模任务日益增长的需求,使得地理空间代码生成技术成为提升生产力的关键因素。尽管大语言模型在代码生成任务中展现出潜力,但由于缺乏领域专业知识与代码语料库,它们在地理空间代码生成中常遇到拒绝生成或产生幻觉等问题。为应对这些挑战,本文提出并开源了GeoCode-PT与GeoCode-SFT语料库,以及GeoCode-Eval评估数据集。此外,通过采用QLoRA与LoRA进行预训练与微调,我们推出了首个专注于地理空间代码生成的大语言模型GeoCode-GPT-7B,该模型基于Code Llama-7B微调而成。进一步地,我们建立了全面的地理空间代码评估框架,整合了选项匹配、专家验证与大语言模型提示工程评分方法,并利用GeoCode-Eval数据集对GeoCode-GPT-7B进行了系统评估。实验结果表明,GeoCode-GPT在多项选择题准确率上优于其他模型9.1%至32.1%,在代码摘要能力上提升1.7%至25.4%,在代码生成能力上提高1.2%至25.1%。本文为提升大语言模型在地理空间代码生成中的性能提供了解决方案与实证验证,拓展了领域专用模型的应用边界,并为释放其在地理空间代码生成中的潜力提供了重要见解。