Large language models, sometimes referred to as foundation models, have transformed multiple fields of research. However, smaller languages risk falling behind due to high training costs and small incentives for large companies to train these models. To combat this, the Danish Foundation Models project seeks to provide and maintain open, well-documented, and high-quality foundation models for the Danish language. This is achieved through broad cooperation with public and private institutions, to ensure high data quality and applicability of the trained models. We present the motivation of the project, the current status, and future perspectives.
翻译:大型语言模型(有时也称为基础模型)已改变了多个研究领域。然而,由于训练成本高昂且大型企业缺乏足够激励来训练这些模型,较小语言面临落后风险。为应对这一挑战,丹麦基础模型项目致力于为丹麦语提供并维护开放、文档完善且高质量的基础模型。该项目通过与公共及私营机构的广泛合作,确保训练模型的数据质量与适用性。本文阐述了该项目的动机、当前进展及未来前景。