Large language models (LLMs) that produce human-like responses have begun to revolutionize research practices in the social sciences. This paper shows how we can integrate LLMs and social surveys to accurately predict individual responses to survey questions that were not asked before. We develop a novel methodological framework to personalize LLMs by considering the meaning of survey questions derived from their text, the latent beliefs of individuals inferred from their response patterns, and the temporal contexts across different survey periods through fine-tuning LLMs with survey data. Using the General Social Survey from 1972 to 2021, we show that the fine-tuned model based on Alpaca-7b can predict individual responses to survey questions that are partially missing as well as entirely missing. The remarkable prediction capabilities allow us to fill in missing trends with high confidence and pinpoint when public attitudes changed, such as the rising support for same-sex marriage. We discuss practical constraints, socio-demographic representation, and ethical concerns regarding individual autonomy and privacy when using LLMs for opinion prediction. This study demonstrates that LLMs and surveys can mutually enhance each other's capabilities: LLMs broaden survey potential, while surveys improve the alignment of LLMs.
翻译:大型语言模型(LLMs)生成类人响应的能力已开始革新社会科学的研究实践。本文展示了如何整合LLMs与社会调查,以准确预测个体对先前未调查问题的响应。我们开发了一种新颖的方法论框架,通过考虑调查问题的文本含义、从个体响应模式中推断出的潜在信念,以及利用调查数据微调LLMs而跨越不同调查时段的时间上下文,来个性化定制LLMs。利用1972年至2021年的综合社会调查数据,我们证明基于Alpaca-7b的微调模型能够预测个体对部分缺失乃至完全缺失的调查问题的响应。这种卓越的预测能力使我们能够高置信度地填补缺失趋势,并精准定位公众态度转变的时刻,例如对同性婚姻支持率的上升。我们讨论了使用LLMs进行观点预测时的实际约束、社会人口代表性,以及关于个体自主性和隐私的伦理问题。本研究证明,LLMs与调查能够相互增强彼此的能力:LLMs拓宽了调查的潜力,而调查则改进了LLMs的对齐性。