Large language models (LLMs) are increasingly used in sustainability-related decision support, reporting, and public communication, yet little systematic evidence exists on the environmental attitudes embedded in their outputs. This paper develops a benchmark for evaluating environmental cognition, affect, and behavioural recommendations in LLMs and applies it to 31 widely used proprietary and open-weight models. Drawing on questions from established environmental awareness surveys and additional sustainability-related behavioural measures, we compare LLM responses 1) among models and 2) between models and human survey benchmarks from Germany. We assess their robustness across prompting conditions. We find that many LLMs align more closely with environmentally progressive attitudes than the average survey respondent, exhibiting higher levels of environmental affect and cognition and recommending behaviours associated with substantial potential CO2 reductions. At the same time, we observe no systematic relationship between sustainability-oriented responses and model origin, size, or release context. However, models exhibit contextual sensitivity, controlled by persona-based prompting and show sycophantic shifts mirroring user-specified ideological positions, which raises concerns about steerability and normative reliability in real-world deployments. Our findings provide a reusable evaluation framework for assessing sustainability-related value alignment in LLMs and highlight the importance of governance, transparency, and critical oversight as AI systems become increasingly embedded in sustainability transformations and public decision-making.
翻译:大型语言模型(LLMs)正越来越多地应用于可持续性相关的决策支持、报告和公共传播,然而,关于其输出中蕴含的环境态度的系统证据仍然匮乏。本文开发了一个基准,用于评估LLMs中的环境认知、情感和行为建议,并将其应用于31个广泛使用的专有和开源权重模型。借鉴既有的环境意识调查问题以及额外的可持续性相关行为测量,我们比较了LLMs的回答:1)在不同模型之间,以及2)在模型与来自德国的基准人类调查之间。我们评估了这些回答在不同提示条件下的稳健性。研究发现,许多LLMs比平均调查受访者更贴近环境进步主义态度,表现出更高的环境情感和认知水平,并推荐了与大幅潜在二氧化碳减排相关的行为。同时,我们未观察到可持续性导向的回答与模型起源、规模或发布背景之间存在系统性的关联。然而,模型表现出情境敏感性,受角色提示控制,并显示出反映用户指定意识形态立场的谄媚性转变,这引发了对其在实际部署中的可操控性和规范性可靠性的担忧。我们的研究结果提供了一个可复用的评估框架,用于评估LLMs中与可持续性相关的价值对齐,并强调了随着人工智能系统日益嵌入可持续性转型和公共决策,治理、透明度和关键性监督的重要性。