Political polling is a multi-billion dollar industry with outsized influence on the societal trajectory of the United States and nations around the world. However, it has been challenged by factors that stress its cost, availability, and accuracy. At the same time, artificial intelligence (AI) chatbots have become compelling stand-ins for human behavior, powered by increasingly sophisticated large language models (LLMs). Could AI chatbots be an effective tool for anticipating public opinion on controversial issues to the extent that they could be used by campaigns, interest groups, and polling firms? We have developed a prompt engineering methodology for eliciting human-like survey responses from ChatGPT, which simulate the response to a policy question of a person described by a set of demographic factors, and produce both an ordinal numeric response score and a textual justification. We execute large scale experiments, querying for thousands of simulated responses at a cost far lower than human surveys. We compare simulated data to human issue polling data from the Cooperative Election Study (CES). We find that ChatGPT is effective at anticipating both the mean level and distribution of public opinion on a variety of policy issues such as abortion bans and approval of the US Supreme Court, particularly in their ideological breakdown (correlation typically >85%). However, it is less successful at anticipating demographic-level differences. Moreover, ChatGPT tends to overgeneralize to new policy issues that arose after its training data was collected, such as US support for involvement in the war in Ukraine. Our work has implications for our understanding of the strengths and limitations of the current generation of AI chatbots as virtual publics or online listening platforms, future directions for LLM development, and applications of AI tools to the political domain. (Abridged)
翻译:政治民调是一个价值数十亿美元的产业,对美国及全球各国的社会发展轨迹具有巨大影响力。然而,该领域正面临成本、可用性和准确性等多重挑战。与此同时,由日益复杂的大语言模型驱动的人工智能聊天机器人,已展现出令人信服的人类行为模拟能力。人工智能聊天机器人能否成为预测争议议题公众舆论的有效工具,从而被竞选活动、利益集团和民调机构所采用?我们开发了一种提示工程方法,使ChatGPT能够生成类似人类的调查响应——模拟具有特定人口统计学特征个体对政策问题的回答,同时生成有序数值评分和文本解释。通过大规模实验,我们以远低于人类调查的成本获取了数千个模拟响应。我们将模拟数据与来自合作选举研究的人类议题民调数据进行比较。研究发现,ChatGPT能有效预测各类政策问题(如堕胎禁令、对美国最高法院的认可度)的公众舆论均值与分布,尤其在意识形态维度上的相关性通常超过85%。但在预测人口统计学层面的差异方面,其表现稍逊。此外,ChatGPT在应对训练数据收集后出现的全新政策议题(如美国是否应介入乌克兰战争)时,存在过度泛化倾向。本研究为理解当前人工智能聊天机器人作为虚拟公众或在线聆听平台的优势与局限、大语言模型的发展方向,以及人工智能工具在政治领域的应用提供了重要启示。(删节版)