PARAMANU-GANITA: Language Model with Mathematical Capabilities

In this paper, we present Paramanu-Ganita, a 208 million parameter novel Auto Regressive (AR) decoder based language model on mathematics. The model is pretrained from scratch at context size of 4096 on our curated mixed mathematical corpus. We evaluate our model on both perplexity metric and GSM8k mathematical benchmark. Paramanu-Ganita despite being 35 times smaller than 7B LLMs, outperformed generalist LLMs such as LLaMa-1 7B by 28.4% points, LLaMa-2 7B by 27.6% points, Falcon 7B by 32.6% points, PaLM 8B by 35.3% points, and math specialised LLMs such as Minerva 8B by 23.2% points, and LLEMMA-7B by 3.0% points in GSM8k test accuracy metric respectively. Paramanu-Ganita also outperformed giant LLMs like PaLM 62B by 6.4% points, Falcon 40B by 19.8% points, LLaMa-1 33B by 3.8% points and Vicuna 13B by 11.8% points respectively. The large significant margin improvement in performance of our math model over the existing LLMs signifies that reasoning capabilities of language model are just not restricted to LLMs with humongous number of parameters. Paramanu-Ganita took 146 hours of A100 training whereas math specialised LLM, LLEMMA 7B, was trained for 23,000 A100 hours of training equivalent. Thus, our approach of pretraining powerful domain specialised language models from scratch for domain adaptation is much more cost-effective than performing continual training of LLMs for domain adaptation. Hence, we conclude that for strong mathematical reasoning abilities of language model, we do not need giant LLMs and immense computing power to our end. In the end, we want to point out that we have only trained Paramanu-Ganita only on a part of our entire mathematical corpus and yet to explore the full potential of our model.

翻译：本文提出Paramanu-Ganita，一个基于数学领域的2.08亿参数新型自回归解码器语言模型。该模型以4096上下文长度在自建混合数学语料库上从头预训练。我们通过困惑度指标和GSM8k数学基准对模型进行评估。尽管Paramanu-Ganita参数规模比70亿参数大语言模型小35倍，但在GSM8k测试准确率上仍以28.4个百分点超越通用型大语言模型LLaMa-1 7B、27.6个百分点超越LLaMa-2 7B、32.6个百分点超越Falcon 7B、35.3个百分点超越PaLM 8B，并以23.2个百分点超越数学专用模型Minerva 8B、3.0个百分点超越LLEMMA-7B。此外，Paramanu-Ganita以6.4个百分点超越PaLM 62B、19.8个百分点超越Falcon 40B、3.8个百分点超越LLaMa-1 33B、11.8个百分点超越Vicuna 13B等超大规模大语言模型。我们的数学模型相对现有大语言模型的显著性能提升证明：语言模型的推理能力并非局限于参数规模庞大的大语言模型。Paramanu-Ganita仅需146小时A100训练，而数学专用模型LLEMMA 7B需23000小时等效A100训练。因此，我们从头预训练强大领域专用语言模型的领域适配方法，比对大语言模型进行持续训练的领域适配方式更具成本效益。由此得出结论：实现语言模型的强数学推理能力，无需依赖巨型大语言模型与海量算力。最后需要指出，Paramanu-Ganita仅基于我们完整数学语料库的部分数据进行训练，其全部潜力仍有待探索。