TSGBench: Time Series Generation Benchmark

Synthetic Time Series Generation (TSG) is crucial in a range of applications, including data augmentation, anomaly detection, and privacy preservation. Although significant strides have been made in this field, existing methods exhibit three key limitations: (1) They often benchmark against similar model types, constraining a holistic view of performance capabilities. (2) The use of specialized synthetic and private datasets introduces biases and hampers generalizability. (3) Ambiguous evaluation measures, often tied to custom networks or downstream tasks, hinder consistent and fair comparison. To overcome these limitations, we introduce \textsf{TSGBench}, the inaugural Time Series Generation Benchmark, designed for a unified and comprehensive assessment of TSG methods. It comprises three modules: (1) a curated collection of publicly available, real-world datasets tailored for TSG, together with a standardized preprocessing pipeline; (2) a comprehensive evaluation measures suite including vanilla measures, new distance-based assessments, and visualization tools; (3) a pioneering generalization test rooted in Domain Adaptation (DA), compatible with all methods. We have conducted comprehensive experiments using \textsf{TSGBench} across a spectrum of ten real-world datasets from diverse domains, utilizing ten advanced TSG methods and twelve evaluation measures. The results highlight the reliability and efficacy of \textsf{TSGBench} in evaluating TSG methods. Crucially, \textsf{TSGBench} delivers a statistical analysis of the performance rankings of these methods, illuminating their varying performance across different datasets and measures and offering nuanced insights into the effectiveness of each method.

翻译：合成时间序列生成（TSG）在数据增强、异常检测及隐私保护等一系列应用中至关重要。尽管该领域已取得显著进展，但现有方法存在三个关键局限：（1）通常仅与相似类型的模型进行基准比较，限制了对其性能能力的全面评估；（2）使用专门的合成及私有数据集引入偏差，并削弱了泛化能力；（3）模糊的评估指标（常与定制网络或下游任务绑定）阻碍了一致且公平的比较。为克服这些局限，我们提出首个时间序列生成基准——\textsf{TSGBench}，旨在对TSG方法进行统一且全面的评估。它包含三个模块：（1）一个经精心整理的公开真实世界数据集集合（专为TSG设计），并配备标准化预处理流程；（2）一套全面的评估指标套件，涵盖常规指标、新型基于距离的评估方法及可视化工具；（3）一项基于领域自适应（DA）的开创性泛化测试，该方法兼容所有TSG方法。我们利用\textsf{TSGBench}，对来自不同领域的十个真实世界数据集进行了全面实验，涉及十种先进TSG方法和十二项评估指标。结果凸显了\textsf{TSGBench}在评估TSG方法方面的可靠性与有效性。至关重要的是，\textsf{TSGBench}提供了这些方法性能排名的统计分析，揭示了它们在不同数据集与评估指标下的性能差异，并为每种方法的有效性提供了细致入微的洞见。