Political polling is a multi-billion dollar industry with outsized influence on the societal trajectory of the United States and nations around the world. However, it has been challenged by factors that stress its cost, availability, and accuracy. At the same time, artificial intelligence (AI) chatbots have become compelling stand-ins for human behavior, powered by increasingly sophisticated large language models (LLMs). Could AI chatbots be an effective tool for anticipating public opinion on controversial issues to the extent that they could be used by campaigns, interest groups, and polling firms? We have developed a prompt engineering methodology for eliciting human-like survey responses from ChatGPT, which simulate the response to a policy question of a person described by a set of demographic factors, and produce both an ordinal numeric response score and a textual justification. We execute large scale experiments, querying for thousands of simulated responses at a cost far lower than human surveys. We compare simulated data to human issue polling data from the Cooperative Election Study (CES). We find that ChatGPT is effective at anticipating both the mean level and distribution of public opinion on a variety of policy issues such as abortion bans and approval of the US Supreme Court, particularly in their ideological breakdown (correlation typically >85%). However, it is less successful at anticipating demographic-level differences. Moreover, ChatGPT tends to overgeneralize to new policy issues that arose after its training data was collected, such as US support for involvement in the war in Ukraine. Our work has implications for our understanding of the strengths and limitations of the current generation of AI chatbots as virtual publics or online listening platforms, future directions for LLM development, and applications of AI tools to the political domain. (Abridged)
翻译:政治民调是一个价值数十亿美元的行业,对美国及世界各国社会走向具有超乎寻常的影响力。然而,该行业正面临成本、可用性和准确性等因素带来的挑战。与此同时,由日益精密的大型语言模型驱动的人工智能聊天机器人,已成为人类行为引人注目的替代品。人工智能聊天机器人能否成为预测争议性议题公众舆论的有效工具,以至于能被竞选活动、利益集团和民调公司所采用?我们开发了一种提示工程方法,用于从ChatGPT中引出类似人类的调查回应。该方法能模拟由一组人口统计因素描述的个人对政策问题的回答,并同时生成序数数值评分和文本理由。我们开展了大规模实验,以远低于人类调查的成本查询了数千个模拟回应。我们将模拟数据与来自合作选举研究的人类议题民调数据进行了比较。研究发现,ChatGPT在预测多种政策议题(如堕胎禁令和对美国最高法院的认可度)的公众舆论均值水平和分布方面非常有效,尤其是在意识形态细分层面(相关性通常>85%)。然而,它在预测人口统计层面的差异方面则不太成功。此外,ChatGPT倾向于过度概括其训练数据收集之后出现的新政策议题,例如美国对介入乌克兰战争的支持程度。我们的工作对于理解当前一代人工智能聊天机器人作为虚拟公众或在线倾听平台的优势与局限性、大型语言模型未来发展的方向,以及人工智能工具在政治领域的应用具有启示意义。(节略)