Temporal Treasure Hunt: Content-based Time Series Retrieval System for Discovering Insights

Time series data is ubiquitous across various domains such as finance, healthcare, and manufacturing, but their properties can vary significantly depending on the domain they originate from. The ability to perform Content-based Time Series Retrieval (CTSR) is crucial for identifying unknown time series examples. However, existing CTSR works typically focus on retrieving time series from a single domain database, which can be inadequate if the user does not know the source of the query time series. This limitation motivates us to investigate the CTSR problem in a scenario where the database contains time series from multiple domains. To facilitate this investigation, we introduce a CTSR benchmark dataset that comprises time series data from a variety of domains, such as motion, power demand, and traffic. This dataset is sourced from a publicly available time series classification dataset archive, making it easily accessible to researchers in the field. We compare several popular methods for modeling and retrieving time series data using this benchmark dataset. Additionally, we propose a novel distance learning model that outperforms the existing methods. Overall, our study highlights the importance of addressing the CTSR problem across multiple domains and provides a useful benchmark dataset for future research.

翻译：时间序列数据在金融、医疗和制造业等众多领域普遍存在，但它们的特性会因所来源领域的不同而存在显著差异。实现内容型时间序列检索（CTSR）对于识别未知的时间序列样本至关重要。然而，现有CTSR研究通常聚焦于从单一领域数据库中进行检索，若用户不清楚查询时间序列的来源，这种方法可能不足以应对需求。这一局限性促使我们探讨在数据库包含多个领域时间序列的场景下的CTSR问题。为支持此项研究，我们引入了一个包含运动、电力需求和交通等多个领域时间序列数据的CTSR基准数据集。该数据集源自公开可用的时间序列分类数据集档案，便于领域研究人员获取。我们利用该基准数据集比较了多种建模与检索时间序列数据的流行方法。此外，我们提出了一种新颖的距离学习模型，其性能优于现有方法。总体而言，本研究强调了跨多领域解决CTSR问题的重要性，并为未来研究提供了实用的基准数据集。

相关内容

数据集

关注 88

数据集，又称为资料集、数据集合或资料集合，是一种由数据所组成的集合。
Data set（或dataset）是一个数据的集合，通常以表格形式出现。每一列代表一个特定变量。每一行都对应于某一成员的数据集的问题。它列出的价值观为每一个变量，如身高和体重的一个物体或价值的随机数。每个数值被称为数据资料。对应于行数，该数据集的数据可能包括一个或多个成员。

UCM《机器学习导论笔记》，80页pdf CSE176 Introduction to Machine Learning

专知会员服务

32+阅读 · 2021年9月29日

生成性对抗网络:理论模型、评估指标和最近发展的概述，Generative Adversarial Networks (GANs): An Overview of Theoretical Model, Evaluation Metrics, and Recent Developments

专知会员服务

42+阅读 · 2020年5月30日

FlowQA: Grasping Flow in History for Conversational Machine Comprehension

专知会员服务

34+阅读 · 2019年10月18日

Auto-Sizing the Transformer Network: Improving Speed, Efficiency, and Performance for Low-Resource Machine Translation

专知会员服务

50+阅读 · 2019年10月17日