Data Augmentation (DA) has emerged as an indispensable strategy in Time Series Classification (TSC), primarily due to its capacity to amplify training samples, thereby bolstering model robustness, diversifying datasets, and curtailing overfitting. However, the current landscape of DA in TSC is plagued with fragmented literature reviews, nebulous methodological taxonomies, inadequate evaluative measures, and a dearth of accessible, user-oriented tools. In light of these challenges, this study embarks on an exhaustive dissection of DA methodologies within the TSC realm. Our initial approach involved an extensive literature review spanning a decade, revealing that contemporary surveys scarcely capture the breadth of advancements in DA for TSC, prompting us to meticulously analyze over 100 scholarly articles to distill more than 60 unique DA techniques. This rigorous analysis precipitated the formulation of a novel taxonomy, purpose-built for the intricacies of DA in TSC, categorizing techniques into five principal echelons: Transformation-Based, Pattern-Based, Generative, Decomposition-Based, and Automated Data Augmentation. Our taxonomy promises to serve as a robust navigational aid for scholars, offering clarity and direction in method selection. Addressing the conspicuous absence of holistic evaluations for prevalent DA techniques, we executed an all-encompassing empirical assessment, wherein upwards of 15 DA strategies were subjected to scrutiny across 8 UCR time-series datasets, employing ResNet and a multi-faceted evaluation paradigm encompassing Accuracy, Method Ranking, and Residual Analysis, yielding a benchmark accuracy of 88.94 +- 11.83%. Our investigation underscored the inconsistent efficacies of DA techniques, with...
翻译:数据增强(DA)已成为时间序列分类(TSC)中不可或缺的策略,主要因其能够扩充训练样本,从而增强模型鲁棒性、丰富数据集多样性并减少过拟合。然而,当前TSC中DA的研究现状面临文献综述碎片化、方法论分类模糊、评估措施不足以及缺乏易用用户导向工具等挑战。针对这些问题,本研究对TSC领域的DA方法进行了系统性剖析。首先,我们开展了一项覆盖十年的广泛文献综述,发现现有综述鲜少涵盖TSC中DA的进展广度,为此我们细致分析了超过100篇学术论文,提炼出60余种独特DA技术。基于此严格分析,我们提出了一个专为TSC中DA复杂性设计的新型分类体系,将技术划分为五个主要层级:基于变换、基于模式、生成式、基于分解以及自动化数据增强。该分类体系有望为学者提供稳健的导航工具,明确方法选择的方向与依据。针对现有主流DA技术缺乏全面评估的显著空白,我们执行了全方位的实证研究,对15种以上DA策略在8个UCR时间序列数据集上进行了严格检验,采用ResNet及涵盖准确率、方法排名与残差分析的多维评估范式,最终获得88.94±11.83%的基准准确率。研究揭示出DA技术效能的显著不一致性……