Time series data supports many domains (e.g., finance and climate science), but its rapid growth strains storage and computation. Dataset condensation can alleviate this by synthesizing a compact training set that preserves key information. Yet most condensation methods are image-centric and often fail on time series because they miss time-series-specific temporal structure, especially local discriminative motifs such as shapelets. In this work, we propose ShapeCond, a novel and efficient condensation framework for time series classification that leverages shapelet-based dataset knowledge via a shapelet-guided optimization strategy. Our shapelet-assisted synthesis cost is independent of sequence length: longer series yield larger speedups in synthesis (e.g., 29$\times$ faster over prior state-of-the-art method CondTSC for time-series condensation, and up to 10,000$\times$ over naively using shapelets on the Sleep dataset with 3,000 timesteps). By explicitly preserving critical local patterns, ShapeCond improves downstream accuracy and consistently outperforms all prior state-of-the-art time series dataset condensation methods across extensive experiments. Code is available at https://github.com/lunaaa95/ShapeCond.
翻译:时间序列数据支撑着众多领域(如金融与气候科学),但其快速增长给存储与计算带来压力。数据集压缩技术可通过合成保留关键信息的紧凑训练集来缓解这一问题。然而,现有压缩方法大多以图像为中心,往往无法有效处理时间序列数据,因为它们忽略了时间序列特有的时序结构,尤其是局部判别性模式(如shapelet)。本研究提出ShapeCond——一种新颖高效的时间序列分类数据集压缩框架,该框架通过shapelet引导的优化策略,利用基于shapelet的数据集知识。我们的shapelet辅助合成成本与序列长度无关:更长的时间序列能带来更大的合成加速(例如,相较于先前最先进的时间序列压缩方法CondTSC提速29倍,在包含3000个时间步的Sleep数据集上相比直接使用shapelet的方法最高可提速10000倍)。通过显式保留关键局部模式,ShapeCond提升了下游任务准确率,并在大量实验中持续超越所有先前最先进的时间序列数据集压缩方法。代码发布于https://github.com/lunaaa95/ShapeCond。