Disjoint or Overlapping? Inference Windowing for Reconstruction-Based Time Series Anomaly Detection

Reconstruction-based methods are widely used for time series anomaly detection, where models are trained to reconstruct subsequences, and anomalies are identified through reconstruction errors. However, reported results are often hard to compare due to heterogeneous evaluation practices and underspecified inference procedures. In this paper, we revisit reconstruction-based anomaly detection in the univariate offline setting and study the role of the inference stride, which controls whether subsequences are processed as disjoint windows or with overlap. We propose a unified training, tuning, and multi-seed evaluation protocol on the curated TSB-AD benchmark, and study how overlapping inference affects anomaly detection performance for a range of reconstruction models, including PCA-based baselines, DLinear, an AutoEncoder, TimesNet, and Transformer variants. The results show that across all models, overlapping windows yield consistent improvements, with average relative gain up to +28%, and can alter method rankings. We further analyze variability across datasets, random seeds, and hyperparameter configurations. Finally, we complement the benchmark study with an evaluation on the full UCR archive using localization criteria aligned with sliding-window reconstruction. Overall, our results highlight that reconstruction-based anomaly detection performance depends not only on model architecture and training, but also on inference choices, motivating a clear and reproducible protocol. Our results show that reconstructionbased baselines achieve strong performance on both TSB-AD and UCR benchmarks, supporting them as competitive and practical approaches for univariate time series anomaly detection.

翻译：基于重构的方法广泛应用于时间序列异常检测，其中模型被训练用于重构子序列，并通过重构误差识别异常。然而，由于评估实践的异质性和推理过程的不明确，已报告的结果往往难以比较。本文在单变量离线设定下重新审视了基于重构的异常检测，并研究了推理步长（控制子序列是作为不重叠窗口处理还是重叠处理）的作用。我们在经过筛选的TSB-AD基准上提出了一种统一的训练、调优和多随机种子评估协议，并研究了重叠推理如何影响多种重构模型（包括基于PCA的基线模型、DLinear、自编码器、TimesNet和Transformer变体）的异常检测性能。结果表明，在所有模型中，重叠窗口带来了一致的性能提升，平均相对增益高达+28%，并可改变方法排名。我们进一步分析了跨数据集、随机种子和超参数配置的变异性。最后，我们通过使用与滑动窗口重构对齐的定位标准，在完整的UCR存档上进行了评估，作为基准研究的补充。总体而言，我们的结果强调，基于重构的异常检测性能不仅依赖于模型架构和训练，还取决于推理选择，这促使采用清晰且可重复的协议。我们的结果表明，基于重构的基线方法在TSB-AD和UCR基准上均取得了强劲的性能，支持其作为单变量时间序列异常检测中具有竞争力且实用的方法。