Large language models are increasingly used to predict human preferences in both scientific and business endeavors, yet current approaches rely exclusively on analyzing model outputs without considering the underlying mechanisms. Using election forecasting as a test case, we introduce mechanistic forecasting, a method that demonstrates that probing internal model representations offers a fundamentally different - and sometimes more effective - approach to preference prediction. Examining over 24 million configurations across 7 models, 6 national elections, multiple persona attributes, and prompt variations, we systematically analyze how demographic and ideological information activates latent party-encoding components within the respective models. We find that leveraging this internal knowledge via mechanistic forecasting (opposed to solely relying on surface-level predictions) can improve prediction accuracy. The effects vary across demographic versus opinion-based attributes, political parties, national contexts, and models. Our findings demonstrate that the latent representational structure of LLMs contains systematic, exploitable information about human preferences, establishing a new path for using language models in social science prediction tasks.
翻译:大型语言模型在科学研究和商业活动中越来越多地用于预测人类偏好,然而现有方法仅依赖于分析模型输出,而未考虑其内在机制。以选举预测为测试案例,我们提出了机制性预测方法,该方法证明探查模型内部表征能够为偏好预测提供一种根本不同且有时更为有效的途径。通过对7个模型、6次全国性选举、多重人物属性及提示变体构成的超过2400万种配置进行系统分析,我们深入研究了人口统计与意识形态信息如何激活各模型内隐的党派编码组件。研究发现,通过机制性预测(而非仅依赖表层预测)利用这种内部知识能够提升预测准确度。这种效应在人口统计属性与观点属性、政党类型、国家背景及不同模型间呈现差异化特征。我们的研究结果表明,大型语言模型的潜在表征结构蕴含着关于人类偏好的系统性、可利用信息,这为在社会科学预测任务中运用语言模型开辟了新路径。