We explore the capabilities of an augmented democracy system built on off-the-shelf LLMs fine-tuned on data summarizing individual preferences across 67 policy proposals collected during the 2022 Brazilian presidential elections. We use a train-test cross-validation setup to estimate the accuracy with which the LLMs predict both: a subject's individual political choices and the aggregate preferences of the full sample of participants. At the individual level, the accuracy of the out of sample predictions lie in the range 69%-76% and are significantly better at predicting the preferences of liberal and college educated participants. At the population level, we aggregate preferences using an adaptation of the Borda score and compare the ranking of policy proposals obtained from a probabilistic sample of participants and from data augmented using LLMs. We find that the augmented data predicts the preferences of the full population of participants better than probabilistic samples alone when these represent less than 30% to 40% of the total population. These results indicate that LLMs are potentially useful for the construction of systems of augmented democracy.
翻译:我们探索了一个增强民主系统的能力,该系统基于现成的大语言模型(LLMs),这些模型在2022年巴西总统选举期间收集的、总结67项政策提案中个人偏好的数据上进行了微调。我们采用训练-测试交叉验证设置来估计LLMs预测以下两者的准确性:受试者的个人政治选择以及全体参与者样本的总体偏好。在个体层面,样本外预测的准确率介于69%-76%之间,并且在预测自由派和受过大学教育的参与者的偏好方面显著更优。在总体层面,我们使用博达计分法的改编版来汇总偏好,并比较从概率样本参与者以及使用LLMs增强的数据中获得的政策提案排名。我们发现,当概率样本仅代表总人口的30%至40%以下时,增强数据比单独的概率样本更能准确预测全体参与者的偏好。这些结果表明,LLMs在构建增强民主系统方面具有潜在用途。