Large language models have been widely evaluated as simulators of individual survey responses. In practice, however, fully unobserved responses are rare; the dominant problem is partial non-response. Imputation aims to restore the overall structure of a survey dataset by filling in these missing values. It has its own well-defined evaluation criteria and differs fundamentally from prediction. We propose to impute missing survey data through in-context learning (ICL). We systematically evaluate ICL design choices across different missingness mechanisms (MCAR, MAR, MNAR) on 150 opinion variables spanning 15 waves of the American Trends Panel. Compared to well-established statistical methods for data imputation like MICE PMM, our ICL approach consistently reduces absolute error across all missingness mechanisms, with the largest gains under non-random missingness (MNAR). Notably, the best-performing specification (gpt-oss-120b with 100 in-context examples) achieves near-nominal aggregate coverage (approaching the 95% level) with confidence intervals two to five times narrower than MICE PMM. We publish a Python package with an sklearn-like API to enable easy deployment of our method using local and proprietary LLMs.
翻译:大型语言模型已被广泛评估为个体调查响应的模拟器。然而在实际中,完全未观测的响应较为罕见;主要问题在于部分未响应。插补旨在通过填充这些缺失值来恢复调查数据集的整体结构。该方法具有明确定义的评估标准,且与预测存在本质差异。我们提出通过上下文学习(ICL)对缺失调查数据进行插补。基于涵盖美国趋势面板15波调查中150个舆论变量的数据,系统评估了不同缺失机制(完全随机缺失、随机缺失、非随机缺失)下ICL的设计方案。与MICE PMM等成熟的统计插补方法相比,我们的ICL方法在所有缺失机制下均持续降低绝对误差,其中在非随机缺失(MNAR)情形下提升最为显著。值得注意的是,最优配置(gpt-oss-120b搭配100个上下文示例)在达到接近95%名义聚合覆盖率的同时,置信区间宽度仅为MICE PMM的1/5至1/2。我们发布了具有scikit-learn风格API的Python包,便于用户通过本地及专有大型语言模型部署该方法。