Most existing time series classification methods adopt a discriminative paradigm that maps input sequences directly to one-hot encoded class labels. While effective, this paradigm struggles to incorporate contextual features and fails to capture semantic relationships among classes. To address these limitations, we propose InstructTime, a novel framework that reformulates time series classification as a multimodal generative task. Specifically, continuous numerical sequences, contextual textual features, and task instructions are treated as multimodal inputs, while class labels are generated as textual outputs by tuned language models. To bridge the modality gap, InstructTime introduces a time series discretization module that converts continuous sequences into discrete temporal tokens, together with an alignment projection layer and a generative self-supervised pre-training strategy to enhance cross-modal representation alignment. Building upon this framework, we further propose InstructTime++, which extends InstructTime by incorporating implicit feature modeling to compensate for the limited inductive bias of language models. InstructTime++ leverages specialized toolkits to mine informative implicit patterns from raw time series and contextual inputs, including statistical feature extraction and vision-language-based image captioning, and translates them into textual descriptions for seamless integration. Extensive experiments on multiple benchmark datasets demonstrate the superior performance of InstructTime++.
翻译:现有时间序列分类方法大多采用判别式范式,将输入序列直接映射至独热编码的类别标签。该范式虽有效,但难以融入上下文特征且无法捕捉类别间的语义关联。为克服这些局限,我们提出InstructTime框架,将时间序列分类重构为多模态生成任务。具体而言,连续数值序列、上下文文本特征与任务指令被视为多模态输入,而类别标签则通过调优的语言模型以文本形式生成。为弥合模态鸿沟,InstructTime引入时间序列离散化模块将连续序列转换为离散时序标记,并结合对齐投影层与生成式自监督预训练策略以增强跨模态表征对齐。基于此框架,我们进一步提出InstructTime++,通过引入隐式特征建模来弥补语言模型有限归纳偏置的不足。InstructTime++利用专用工具包从原始时间序列与上下文输入中挖掘信息性隐式模式(包括统计特征提取与基于视觉语言的图像描述生成),并将其转化为文本描述以实现无缝集成。在多个基准数据集上的大量实验验证了InstructTime++的优越性能。