Inspired by recent advancements in large language models (LLMs) for Natural Language Processing (NLP), there has been a surge in research focused on developing foundational models for time series forecasting. One approach involves training LLM architectures on tokenized time series data using cross-entropy loss. Although this method has demonstrated promising results, cross-entropy loss is primarily designed for classification tasks and does not account for the distance between classes. To address this limitation, we propose using the Wasserstein loss for such architectures. To validate our approach, we fine-tuned a foundational time series model on $22$ zero-shot datasets, comparing the performance of cross-entropy loss with that of Wasserstein loss. Our results demonstrate that replacing cross-entropy loss with Wasserstein loss significantly improves point estimation.
翻译:受自然语言处理(NLP)领域大型语言模型(LLMs)近期进展的启发,针对时间序列预测的基础模型研究呈现激增态势。一种方法涉及使用交叉熵损失在令牌化时间序列数据上训练LLM架构。尽管该方法已展现出有前景的结果,但交叉熵损失主要针对分类任务设计,未考虑类别间的距离。为克服此局限,我们提出在此类架构中使用Wasserstein损失。为验证我们的方法,我们在$22$个零样本数据集上对基础时间序列模型进行了微调,比较了交叉熵损失与Wasserstein损失的性能。结果表明,用Wasserstein损失替代交叉熵损失能显著提升点估计精度。