Large language models (LLMs) are increasingly used to describe and evaluate cities, yet the cultural structure of their urban judgments remains understudied. Here we introduce a measurement framework for testing whether LLM-based urban perception is culturally neutral, using a globally stratified street-view image dataset. Open-ended descriptions and structured scores generated by three frontier multimodal models all show that the neutral baseline lies closer to regional framings associated with Europe and North America than to other cultural framings. Comparisons between AI and human urban perception further show that prompting can move AI responses closer to specific regional human descriptions, but fails to recover the variety and diversity of human responses, flattening observed demographic patterns and introducing sentiment-based self-favouring bias. These results indicate a systematic risk in treating AI as a neutral tool for urban tasks, especially when model outputs are used to compare, evaluate or represent cities across cultural contexts.
翻译:大语言模型(LLMs)越来越多地被用于描述和评估城市,然而其城市判断的文化结构仍研究不足。本文引入了一个测量框架,利用全球分层街景图像数据集,检验基于LLM的城市感知是否具备文化中立性。三种前沿多模态模型生成的开放式描述与结构化评分均显示,中立基线更接近与欧洲及北美相关的区域框架,而非其他文化框架。人工智能与人类城市感知的进一步比较表明,提示(prompting)虽可使人响应向特定区域的人类描述靠拢,但无法复原人类响应的多样性与异质性,反而扁平化所观察到的人口模式,并引入基于情感的自利偏差。这些结果表明,将人工智能视为城市任务的通用工具存在系统性风险,尤其是当模型输出用于跨文化背景的城市比较、评估或表征时。