Reasoning and Tools for Human-Level Forecasting

Language models (LMs) trained on web-scale datasets are largely successful due to their ability to memorize large amounts of training data, even if only present in a few examples. These capabilities are often desirable in evaluation on tasks such as question answering but raise questions about whether these models can exhibit genuine reasoning or succeed only at mimicking patterns from the training data. This distinction is particularly salient in forecasting tasks, where the answer is not present in the training data, and the model must reason to make logical deductions. We present Reasoning and Tools for Forecasting (RTF), a framework of reasoning-and-acting (ReAct) agents that can dynamically retrieve updated information and run numerical simulation with equipped tools. We evaluate our model with questions from competitive forecasting platforms and demonstrate that our method is competitive with and can outperform human predictions. This suggests that LMs, with the right tools, can indeed think and adapt like humans, offering valuable insights for real-world decision-making.

翻译：在网页规模数据集上训练的语言模型之所以取得巨大成功，很大程度上归功于其记忆大量训练数据的能力，即使这些数据仅出现于少数样本中。这种能力在问答等任务的评估中通常是可取的，但也引发了疑问：这些模型是否能展现真正的推理能力，还是仅仅在模仿训练数据中的模式。这一区别在预测任务中尤为突出，因为答案并不存在于训练数据中，模型必须通过推理进行逻辑演绎。我们提出了用于预测的推理与工具框架，这是一个推理与行动智能体框架，能够动态检索更新信息，并利用配备的工具运行数值模拟。我们使用来自竞争性预测平台的问题评估了我们的模型，并证明我们的方法可与人类预测相媲美，甚至能够超越人类预测。这表明，配备适当工具的语言模型确实能够像人类一样思考与适应，为现实世界的决策提供有价值的洞见。

相关内容

TOOLS

关注 1

这个新版本的工具会议系列恢复了从1989年到2012年的50个会议的传统。工具最初是“面向对象语言和系统的技术”，后来发展到包括软件技术的所有创新方面。今天许多最重要的软件概念都是在这里首次引入的。2019年TOOLS 50+1在俄罗斯喀山附近举行，以同样的创新精神、对所有与软件相关的事物的热情、科学稳健性和行业适用性的结合以及欢迎该领域所有趋势和社区的开放态度，延续了该系列。官网链接：http://tools2019.innopolis.ru/

【CVPR 2022】一个完全无监督的框架，从噪声和部分测量中学习图像，Robust Equivariant Imaging: a fully unsupervised framework for learning to image

专知会员服务

25+阅读 · 2022年3月3日

【NeurIPS2021】用于文本图表示学习的 GNN 嵌套 Transformer 模型：GraphFormers

专知会员服务

46+阅读 · 2021年11月24日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日