Recently, there has been increased interest in globally distributed training, which has the promise to both reduce training costs and democratize participation in building large-scale foundation models. However, existing models trained in a globally distributed manner are relatively small in scale and have only been trained with whitelisted participants. Therefore, they do not yet realize the full promise of democratized participation. In this report, we describe Covenant-72B, an LLM produced by the largest collaborative globally distributed pre-training run (in terms of both compute and model scale), which simultaneously allowed open, permissionless participation supported by a live blockchain protocol. We utilized a state-of-the-art communication-efficient optimizer, SparseLoCo, supporting dynamic participation with peers joining and leaving freely. Our model, pre-trained on approximately 1.1T tokens, performs competitively with fully centralized models pre-trained on similar or higher compute budgets, demonstrating that fully democratized, non-whitelisted participation is not only feasible, but can be achieved at unprecedented scale for a globally distributed pre-training run.
翻译:近来,全球分布式训练日益受到关注,它有望同时降低训练成本并促进大规模基础模型构建的民主化参与。然而,现有以全球分布式方式训练的模型规模相对较小,且仅限白名单参与者参与训练。因此,它们尚未完全实现民主化参与的愿景。本报告介绍了Covenant-72B,这是一个通过规模最大(在算力和模型规模上)的协作式全球分布式预训练运行产生的大语言模型,该训练同时允许由实时区块链协议支持的开放、无需许可的参与。我们采用了最先进的通信高效优化器SparseLoCo,支持节点动态参与,可自由加入和退出。我们的模型在约1.1万亿词元上进行了预训练,其性能与在相似或更高算力预算下完全集中式预训练的模型相当,这表明完全民主化、非白名单参与的全球分布式预训练不仅是可行的,而且可以在前所未有的规模上实现。