We investigate the reliable use of simulated survey responses from large language models (LLMs) through the lens of uncertainty quantification. Our approach converts synthetic data into confidence sets for population parameters of human responses, addressing the distribution shift between the simulated and real populations. A key innovation lies in determining the optimal number of simulated responses: too many produce overly narrow confidence sets with poor coverage, while too few yield excessively loose estimates. To resolve this, our method adaptively selects the simulation sample size, ensuring valid average-case coverage guarantees. It is broadly applicable to any LLM, irrespective of its fidelity, and any procedure for constructing confidence sets. Additionally, the selected sample size quantifies the degree of misalignment between the LLM and the target human population. We illustrate our method on real datasets and LLMs.
翻译:本研究通过不确定性量化的视角,探讨如何可靠地利用大语言模型(LLM)生成的模拟调查响应。我们的方法将合成数据转化为针对人类响应总体参数的置信集,从而处理模拟群体与真实群体之间的分布偏移。一个关键创新在于确定模拟响应的最优数量:过多的模拟会产生覆盖度不足的过窄置信集,而过少的模拟则会导致估计过于宽松。为解决这一问题,我们的方法自适应地选择模拟样本量,确保有效的平均情况覆盖保证。该方法广泛适用于任何LLM(无论其保真度如何)以及任何构建置信集的流程。此外,所选样本量能够量化LLM与目标人类群体之间的错配程度。我们在真实数据集和LLM上验证了所提方法。