Time series data is ubiquitous across various domains such as finance, healthcare, and manufacturing, but their properties can vary significantly depending on the domain they originate from. The ability to perform Content-based Time Series Retrieval (CTSR) is crucial for identifying unknown time series examples. However, existing CTSR works typically focus on retrieving time series from a single domain database, which can be inadequate if the user does not know the source of the query time series. This limitation motivates us to investigate the CTSR problem in a scenario where the database contains time series from multiple domains. To facilitate this investigation, we introduce a CTSR benchmark dataset that comprises time series data from a variety of domains, such as motion, power demand, and traffic. This dataset is sourced from a publicly available time series classification dataset archive, making it easily accessible to researchers in the field. We compare several popular methods for modeling and retrieving time series data using this benchmark dataset. Additionally, we propose a novel distance learning model that outperforms the existing methods. Overall, our study highlights the importance of addressing the CTSR problem across multiple domains and provides a useful benchmark dataset for future research.
翻译:时间序列数据在金融、医疗和制造业等众多领域普遍存在,但它们的特性会因所来源领域的不同而存在显著差异。实现内容型时间序列检索(CTSR)对于识别未知的时间序列样本至关重要。然而,现有CTSR研究通常聚焦于从单一领域数据库中进行检索,若用户不清楚查询时间序列的来源,这种方法可能不足以应对需求。这一局限性促使我们探讨在数据库包含多个领域时间序列的场景下的CTSR问题。为支持此项研究,我们引入了一个包含运动、电力需求和交通等多个领域时间序列数据的CTSR基准数据集。该数据集源自公开可用的时间序列分类数据集档案,便于领域研究人员获取。我们利用该基准数据集比较了多种建模与检索时间序列数据的流行方法。此外,我们提出了一种新颖的距离学习模型,其性能优于现有方法。总体而言,本研究强调了跨多领域解决CTSR问题的重要性,并为未来研究提供了实用的基准数据集。