Language is constantly changing and evolving, leaving language models to become quickly outdated. Consequently, we should continuously update our models with new data to expose them to new events and facts. However, that requires additional computing, which means new carbon emissions. Do any measurable benefits justify this cost? This paper looks for empirical evidence to support continuous training. We reproduce existing benchmarks and extend them to include additional time periods, models, and tasks. Our results show that the downstream task performance of temporally adapted English models for social media data do not improve over time. Pretrained models without temporal adaptation are actually significantly more effective and efficient. However, we also note a lack of suitable temporal benchmarks. Our findings invite a critical reflection on when and how to temporally adapt language models, accounting for sustainability.
翻译:语言在不断变化和演进,这使得语言模型容易迅速过时。因此,我们应当持续用新数据更新模型,使其接触新事件和新事实。然而,这需要额外计算,进而产生新的碳排放。这种代价是否有可衡量的益处来证明其合理性?本文寻找支持持续训练的实证证据。我们复现了现有基准测试,并将其扩展至包含更多时间段、模型和任务。结果表明,针对社交媒体数据的时间适应英文模型在下游任务性能上并未随时间提升。实际上,未经时间适应的预训练模型在效果和效率上均显著更优。不过,我们也注意到缺乏合适的时间维度基准测试。我们的研究结果引发了对何时以及如何以可持续方式对语言模型进行时间适应的批判性反思。