The field of general time series analysis has recently begun to explore unified modeling, where a common architectural backbone can be retrained on a specific task for a specific dataset. In this work, we approach unification from a complementary vantage point: unification across tasks and domains. To this end, we explore the impact of discrete, learnt, time series data representations that enable generalist, cross-domain training. Our method, TOTEM, or TOkenized Time Series EMbeddings, proposes a simple tokenizer architecture that embeds time series data from varying domains using a discrete vectorized representation learned in a self-supervised manner. TOTEM works across multiple tasks and domains with minimal to no tuning. We study the efficacy of TOTEM with an extensive evaluation on 17 real world time series datasets across 3 tasks. We evaluate both the specialist (i.e., training a model on each domain) and generalist (i.e., training a single model on many domains) settings, and show that TOTEM matches or outperforms previous best methods on several popular benchmarks. The code can be found at: https://github.com/SaberaTalukder/TOTEM.
翻译:通用时间序列分析领域近期开始探索统一建模方法,即通过一个通用的架构主干在不同数据集上针对特定任务进行再训练。本研究从互补视角切入这一统一化进程:实现跨任务与跨领域的统一。为此,我们探究了离散化、可学习的时间序列数据表征的影响,这类表征能够支持泛化性跨领域训练。我们提出的方法TOTEM(即标记化时间序列嵌入)采用一种简洁的标记器架构,通过自监督学习方式将来自不同领域的时间序列数据嵌入离散向量化表征中。TOTEM可在无需或极少调参的情况下适用于多种任务与领域。我们通过在3项任务、17个真实世界时间序列数据集上的广泛评估验证了TOTEM的有效性。在专业化(即针对每个领域独立训练模型)与泛化化(即跨多个领域训练单一模型)两种设定下,TOTEM均能匹配甚至超越多个主流基准测试中的现有最优方法。相关代码可访问:https://github.com/SaberaTalukder/TOTEM。