Large language models (LLMs) have achieved remarkable success in NLP and multimodal tasks, among others. Despite these successes, two main challenges remain in developing LLMs: (i) high computational cost, and (ii) fair and objective evaluations. In this paper, we report a solution to significantly reduce LLM training cost through a growth strategy. We demonstrate that a 101B-parameter LLM with 0.31T tokens can be trained with a budget of 100K US dollars. Inspired by IQ tests, we also consolidate an additional range of evaluations on top of existing evaluations that focus on knowledge-oriented abilities. These IQ evaluations include symbolic mapping, rule understanding, pattern mining, and anti-interference. Such evaluations minimize the potential impact of memorization. Experimental results show that our model, named FLM-101B, trained with a budget of 100K US dollars, achieves performance comparable to powerful and well-known models, e.g., GPT-3 and GLM-130B, especially on the additional range of IQ evaluations. The checkpoint of FLM-101B is released at https://huggingface.co/CofeAI/FLM-101B.
翻译:大语言模型(LLMs)已在自然语言处理和多模态等任务中取得显著成功。尽管取得这些成就,开发大语言模型仍面临两大挑战:(i)高昂的计算成本,以及(ii)公平客观的评估。本文报告了一种通过增长策略显著降低大语言模型训练成本的解决方案。我们证明,一个拥有1010亿参数、训练0.31T token的大语言模型可在10万美元预算内完成训练。受智商测试启发,我们在现有侧重知识能力的评估基础上,整合了一系列额外的评估维度。这些智商评估包括符号映射、规则理解、模式挖掘和抗干扰能力。此类评估能最大程度降低记忆能力带来的潜在影响。实验结果表明,我们的模型FLM-101B在10万美元预算下训练,在性能上可与GPT-3和GLM-130B等强大知名模型相媲美,尤其在新增的智商评估维度上表现突出。FLM-101B的检查点已发布于https://huggingface.co/CofeAI/FLM-101B。