This paper proposes a foundation model called "CLaSP" that can search time series signals using natural language that describes the characteristics of the signals as queries. Previous efforts to represent time series signal data in natural language have had challenges in designing a conventional class of time series signal characteristics, formulating their quantification, and creating a dictionary of synonyms. To overcome these limitations, the proposed method introduces a neural network based on contrastive learning. This network is first trained using the datasets TRUCE and SUSHI, which consist of time series signals and their corresponding natural language descriptions. Previous studies have proposed vocabularies that data analysts use to describe signal characteristics, and SUSHI was designed to cover these terms. We believe that a neural network trained on these datasets will enable data analysts to search using natural language vocabulary. Furthermore, our method does not require a dictionary of predefined synonyms, and it leverages common sense knowledge embedded in a large-scale language model (LLM). Experimental results demonstrate that CLaSP enables natural language search of time series signal data and can accurately learn the points at which signal data changes.
翻译:本文提出了一种名为"CLaSP"的基础模型,能够使用描述信号特征的自然语言作为查询来搜索时间序列信号。以往将时间序列信号数据表示为自然语言的研究在以下方面面临挑战:设计传统类别的时间序列信号特征、制定其量化方法以及创建同义词词典。为克服这些限制,本文提出了一种基于对比学习的神经网络方法。该网络首先使用TRUCE和SUSHI数据集进行训练,这些数据集包含时间序列信号及其对应的自然语言描述。先前研究提出了数据分析师用于描述信号特征的词汇表,而SUSHI数据集的设计覆盖了这些术语。我们相信,在这些数据集上训练的神经网络将使数据分析师能够使用自然语言词汇进行搜索。此外,我们的方法不需要预定义的同义词词典,而是利用了大规模语言模型(LLM)中蕴含的常识知识。实验结果表明,CLaSP能够实现时间序列信号数据的自然语言搜索,并能准确学习信号数据的变化点。