Forecasting future events is important for policy and decision making. In this work, we study whether language models (LMs) can forecast at the level of competitive human forecasters. Towards this goal, we develop a retrieval-augmented LM system designed to automatically search for relevant information, generate forecasts, and aggregate predictions. To facilitate our study, we collect a large dataset of questions from competitive forecasting platforms. Under a test set published after the knowledge cut-offs of our LMs, we evaluate the end-to-end performance of our system against the aggregates of human forecasts. On average, the system nears the crowd aggregate of competitive forecasters, and in some settings surpasses it. Our work suggests that using LMs to forecast the future could provide accurate predictions at scale and help to inform institutional decision making.
翻译:预测未来事件对政策制定和决策具有重要意义。本研究探讨语言模型(LM)能否达到具有竞争力的预测者的人类水平。为此,我们开发了一种检索增强型语言模型系统,该系统能够自动搜索相关信息、生成预测并整合预测结果。为便于研究,我们从竞争性预测平台收集了大规模问题数据集。在语言模型知识截止日期后发布的测试集上,我们评估了该系统端到端性能与人类预测聚合结果的对比。平均而言,该系统接近甚至在某些场景下超越了竞争性预测者的群体聚合结果。研究表明,利用语言模型预测未来能够实现规模化下的准确预测,并为机构决策提供参考。