The current trend to improve language model performance seems to be based on scaling up with the number of parameters (e.g. the state of the art GPT4 model has approximately 1.7 trillion parameters) or the amount of training data fed into the model. However this comes at significant costs in terms of computational resources and energy costs that compromise the sustainability of AI solutions, as well as risk relating to privacy and misuse. In this paper we present the Erasmian Language Model (ELM) a small context specific, 900 million parameter model, pre-trained and fine-tuned by and for Erasmus University Rotterdam. We show how the model performs adequately in a classroom context for essay writing, and how it achieves superior performance in subjects that are part of its context. This has implications for a wide range of institutions and organizations, showing that context specific language models may be a viable alternative for resource constrained, privacy sensitive use cases.
翻译:当前提升语言模型性能的趋势似乎主要基于扩大参数规模(例如,最先进的GPT4模型拥有约1.7万亿参数)或增加训练数据量。然而,这种方法在计算资源和能源成本方面代价高昂,影响了人工智能解决方案的可持续性,并存在隐私与滥用的风险。本文提出伊拉斯谟语言模型(ELM),这是一个具有9亿参数的小型语境特定模型,由鹿特丹伊拉斯谟大学为其自身需求进行预训练与微调。我们展示了该模型在课堂论文写作场景中的适用表现,及其在自身语境相关学科中取得的优异性能。这对众多机构与组织具有启示意义,表明语境特定语言模型可能成为资源受限、注重隐私的应用场景的可行替代方案。