The number of Language Models (LMs) dedicated to processing scientific text is on the rise. Keeping pace with the rapid growth of scientific LMs (SciLMs) has become a daunting task for researchers. To date, no comprehensive surveys on SciLMs have been undertaken, leaving this issue unaddressed. Given the constant stream of new SciLMs, appraising the state-of-the-art and how they compare to each other remain largely unknown. This work fills that gap and provides a comprehensive review of SciLMs, including an extensive analysis of their effectiveness across different domains, tasks and datasets, and a discussion on the challenges that lie ahead.
翻译:针对科学文本处理的语言模型数量正日益增长。如何跟上科学语言模型快速发展的步伐,已成为研究人员面临的艰巨任务。迄今为止,尚未有关于科学语言模型的全面综述,这一问题仍未得到解决。鉴于新的科学语言模型不断涌现,评估其当前最佳水平及其相互间的对比情况仍大多是未知数。本文填补了这一空白,对科学语言模型进行了全面回顾,包括对其在不同领域、任务和数据集上有效性的深入分析,以及对未来挑战的探讨。