We investigate whether Large Language Models (LLMs) can track public opinion as measured by exit polls during the 2024 U.S. presidential election cycle. Our analysis focuses on headline favorability (e.g., "Favorable" vs. "Unfavorable") of presidential candidates across multiple LLMs queried daily throughout the election season. Using the publicly available llm-election-data-2024 dataset, we evaluate predictions from nine LLM configurations against a curated set of five high-quality polls from major organizations including Reuters, CNN, Gallup, Quinnipiac, and ABC. We find systematic directional miscalibration. For Kamala Harris, all models overpredict favorability by 10-40% relative to polls. For Donald Trump, biases are smaller (5-10%) and poll-dependent, with substantially lower cross-model variation. These deviations persist under temporal smoothing and are not corrected by internet-augmented retrieval. We conclude that off-the-shelf LLMs do not reliably track polls when queried in a straightforward manner and discuss implications for election forecasting.
翻译:本研究探讨大型语言模型(LLMs)能否追踪2024年美国总统选举周期中通过出口民调所衡量的公众意见。我们的分析聚焦于整个选举季期间每日查询的多个LLM对总统候选人头条好感度(例如“好感”与“非好感”)的预测。利用公开可用的llm-election-data-2024数据集,我们评估了九种LLM配置的预测结果,并与来自路透社、CNN、盖洛普、昆尼皮亚克及ABC等主要机构的五项高质量民调组成的精选集进行对比。我们发现存在系统性的方向性误校准。对于卡玛拉·哈里斯,所有模型相对于民调高估其好感度10-40%。对于唐纳德·特朗普,偏差较小(5-10%)且依赖于具体民调,同时跨模型差异显著更低。这些偏差在时间平滑处理下依然存在,且无法通过互联网增强检索得到纠正。我们的结论是,现成的LLM在直接查询时并不能可靠地追踪民调,并讨论了这对选举预测的启示。