Language models (LMs) exhibit remarkable abilities to solve new tasks from just a few examples or textual instructions, especially at scale. They also, paradoxically, struggle with basic functionality, such as arithmetic or factual lookup, where much simpler and smaller models excel. In this paper, we show that LMs can teach themselves to use external tools via simple APIs and achieve the best of both worlds. We introduce Toolformer, a model trained to decide which APIs to call, when to call them, what arguments to pass, and how to best incorporate the results into future token prediction. This is done in a self-supervised way, requiring nothing more than a handful of demonstrations for each API. We incorporate a range of tools, including a calculator, a Q\&A system, two different search engines, a translation system, and a calendar. Toolformer achieves substantially improved zero-shot performance across a variety of downstream tasks, often competitive with much larger models, without sacrificing its core language modeling abilities.
翻译:语言模型展现出从少量示例或文本指令中解决新任务的卓越能力,尤其在规模扩大时更为显著。然而矛盾的是,它们在算术运算、事实查询等基础功能上却表现挣扎,而这些功能往往由更简单的小型模型更擅长完成。本文证明,语言模型可通过简单API自学使用外部工具,从而实现兼具两者优势的效果。我们提出Toolformer——一个经过训练的模型,能够自主决定调用哪些API、何时调用、传递何种参数,以及如何最佳地将结果融入后续词元预测。该训练以自监督方式完成,仅需为每个API提供少量演示。我们整合了包括计算器、问答系统、两种不同搜索引擎、翻译系统和日历在内的多种工具。Toolformer在多种下游任务中实现了显著提升的零样本性能,往往能与更大模型媲美,同时未牺牲其核心语言建模能力。