Climate change presents significant challenges to the global community, and it is imperative to raise widespread awareness of the climate crisis and educate users about low-carbon living. Artificial intelligence, particularly large language models (LLMs), have emerged as powerful tools in mitigating the climate crisis, leveraging their extensive knowledge, broad user base, and natural language interaction capabilities. However, despite the growing body of research on climate change, there is a lack of comprehensive assessments of climate crisis knowledge within LLMs. This paper aims to resolve this gap by proposing an automatic evaluation framework. We employ a hybrid approach to data acquisition that combines data synthesis and manual collection to compile a diverse set of questions related to the climate crisis. These questions cover various aspects of climate change, including its causes, impacts, mitigation strategies, and adaptation measures. We then evaluate the model knowledge through prompt engineering based on the collected questions and generated answers. We propose a set of comprehensive metrics to evaluate the climate crisis knowledge, incorporating indicators from 10 different perspectives. Experimental results show that our method is effective in evaluating the knowledge of LLMs regarding the climate crisis. We evaluate several state-of-the-art LLMs and find that their knowledge falls short in terms of timeliness.
翻译:气候变化对全球社会构成重大挑战,提升公众对气候危机的认知并倡导低碳生活方式至关重要。人工智能,尤其是大型语言模型(LLMs),凭借其广泛的知识储备、庞大的用户群体和自然语言交互能力,已成为缓解气候危机的有力工具。然而,尽管气候变化研究日益增多,但目前仍缺乏对LLMs中气候危机知识的全面评估。本文旨在通过提出一种自动化评估框架来填补这一空白。我们在数据获取中采用混合方法,结合数据合成与人工采集,构建了涵盖气候危机相关问题的多样化数据集。这些问题涉及气候变化的多个维度,包括成因、影响、减缓策略及适应措施。随后,我们基于收集的问题与生成的回答,通过提示工程评估模型知识。我们提出了一套综合指标,从10个不同维度对气候危机知识进行评价。实验结果表明,该方法能有效评估LLMs在气候危机方面的知识水平。我们对多个先进LLMs进行了评估,发现其知识在时效性方面存在不足。