Understanding how climate change affects us and learning about available solutions are key steps toward empowering individuals and communities to mitigate and adapt to it. As Large Language Models (LLMs) rise in popularity, it is necessary to assess their capability in this domain. In this study, we present a comprehensive evaluation framework, grounded in science communication principles, to analyze LLM responses to climate change topics. Our framework emphasizes both the presentational and epistemological adequacy of answers, offering a fine-grained analysis of LLM generations. Spanning 8 dimensions, our framework discerns up to 30 distinct issues in model outputs. The task is a real-world example of a growing number of challenging problems where AI can complement and lift human performance. We introduce a novel and practical protocol for scalable oversight that uses AI Assistance and relies on raters with relevant educational backgrounds. We evaluate several recent LLMs and conduct a comprehensive analysis of the results, shedding light on both the potential and the limitations of LLMs in the realm of climate communication.
翻译:了解气候变化如何影响我们,并学习现有解决方案,是赋能个人和社区减缓和适应气候变化的关键步骤。随着大型语言模型(LLMs)日益普及,评估其在该领域的能力变得必要。在本研究中,我们提出一个基于科学传播原则的综合评估框架,用于分析LLMs对气候变化主题的响应。该框架同时强调答案的表征充分性与认识论充分性,能够对LLMs生成内容进行细粒度分析。框架涵盖8个维度,可识别模型输出中多达30种不同问题。这项任务属于人工智能辅助并提升人类绩效的日益增多的挑战性问题的现实案例。我们引入一种新颖实用的可扩展监督协议,该协议利用人工智能辅助技术,并依赖具备相关教育背景的评分员。我们评估了多个近期LLMs,并对结果进行综合分析,揭示了LLMs在气候传播领域的潜力与局限性。