The recent rapid development of language models (LMs) has attracted attention in the field of time series, including multimodal time series modeling. However, we note that current time series multimodal methods are biased, often assigning a primary role to one modality while the other assumes a secondary role. They overlook the mutual benefits and complementary of different modalities. For example, in seizure diagnosis, relying solely on textual clinical reports makes it difficult to pinpoint the area and type of the disease, while electroencephalograms (EEGs) alone cannot provide an accurate diagnosis without considering the symptoms. In this study, based on the complementary information mining of time series multimodal data, we propose DualTime, a Dual-adapter multimodal language model for Time series representation implementing temporal-primary and textual-primary modeling simultaneously. By injecting lightweight adaption tokens, the LM pipeline shared by dual adapters encourages embedding alignment and achieves efficient fine-tuning. Empirically, our method outperforms state-of-the-art models in both supervised and unsupervised settings, highlighting the complementary benefits of different modalities. In addition, we conduct few-shot label transfer experiments, which further verifies the transferability and expressiveness of our proposed DualTime.
翻译:近年来语言模型的快速发展引起了时间序列领域的关注,包括多模态时间序列建模。然而,我们注意到当前的时间序列多模态方法存在偏颇,往往将一种模态置于主要角色,而另一种模态则处于次要地位。它们忽略了不同模态之间的相互增益与互补性。例如,在癫痫诊断中,仅依赖文本临床报告难以精确定位疾病的区域和类型,而仅凭脑电图若不结合症状信息则无法提供准确诊断。本研究基于时间序列多模态数据的互补信息挖掘,提出了DualTime——一种用于时间序列表示的双适配器多模态语言模型,可同时实现时序主导与文本主导的建模。通过注入轻量级适配令牌,双适配器共享的语言模型流程促进了嵌入对齐,并实现了高效微调。实验表明,我们的方法在监督和无监督设置下均优于现有最优模型,凸显了不同模态的互补优势。此外,我们进行了少样本标签迁移实验,进一步验证了所提DualTime的可迁移性与表达能力。