MSTN: A Lightweight and Fast Model for General TimeSeries Analysis

Real-world time series often exhibit strong non-stationarity, complex nonlinear dynamics, and behavior expressed across multiple temporal scales, from rapid local fluctuations to slow-evolving long-range trends. However, many contemporary architectures impose rigid, fixed-scale structural priors -- such as patch-based tokenization, predefined receptive fields, or frozen backbone encoders -- which can over-regularize temporal dynamics and limit adaptability to abrupt high-magnitude events. To handle this, we introduce the Multi-scale Temporal Network (MSTN), a hybrid neural architecture grounded in an Early Temporal Aggregation principle. MSTN integrates three complementary components: (i) a multi-scale convolutional encoder that captures fine-grained local structure; (ii) a sequence modeling module that learns long-range dependencies through either recurrent or attention-based mechanisms; and (iii) a self-gated fusion stage incorporating squeeze-excitation and a single dense layer to dynamically reweight and fuse multi-scale representations. This design enables MSTN to flexibly model temporal patterns spanning milliseconds to extended horizons, while avoiding the computational burden typically associated with long-context models. Across extensive benchmarks covering imputation, long term forecasting, short term forecasting, classification, and cross-dataset generalization, MSTN achieves state-of-the-art performance, establishing new best results on 33 of 40 datasets, while remaining lightweight ($\sim$278,520 params for MSTN-BiLSTM and $\sim$950,776 $\approx$ 1M for MSTN-Transformer) and suitable for low-latency inference ($<$1 sec, often in milliseconds), resource-constrained deployment.

翻译：现实世界中的时间序列常表现出强非平稳性、复杂非线性动力学以及跨多个时间尺度（从快速局部波动到缓慢演变的长期趋势）的行为特征。然而，当前许多架构（如基于分块的令牌化、预定义感受野或冻结骨干编码器）施加了刚性、固定尺度的结构先验，这可能会过度约束时间动态并限制其对突发高幅事件的适应能力。为解决此问题，我们提出多尺度时间网络（MSTN），这是一种基于早期时序聚合原则的混合神经架构。MSTN 集成三个互补组件：(i) 用于捕获细粒度局部结构的多尺度卷积编码器；(ii) 通过循环或注意力机制学习长程依赖的序列建模模块；(iii) 结合挤压-激励与单密集层的自门控融合阶段，用于动态重新加权并融合多尺度表示。此设计使 MSTN 能灵活建模从毫秒到扩展时间跨度的时间模式，同时避免长上下文模型常见的计算负担。在涵盖插补、长期预测、短期预测、分类及跨数据集泛化的广泛基准测试中，MSTN 实现了先进性能，在 40 个数据集中的 33 个上创下新的最佳结果，同时保持轻量级（MSTN-BiLSTM 约 278,520 参数，MSTN-Transformer 约 950,776 ≈ 1M 参数）且适用于低延迟推理（<1 秒，通常为毫秒级）及资源受限部署。