The selection of algorithms is a crucial step in designing AI services for real-world time series classification use cases. Traditional methods such as neural architecture search, automated machine learning, combined algorithm selection, and hyperparameter optimizations are effective but require considerable computational resources and necessitate access to all data points to run their optimizations. In this work, we introduce a novel data fingerprint that describes any time series classification dataset in a privacy-preserving manner and provides insight into the algorithm selection problem without requiring training on the (unseen) dataset. By decomposing the multi-target regression problem, only our data fingerprints are used to estimate algorithm performance and uncertainty in a scalable and adaptable manner. Our approach is evaluated on the 112 University of California riverside benchmark datasets, demonstrating its effectiveness in predicting the performance of 35 state-of-the-art algorithms and providing valuable insights for effective algorithm selection in time series classification service systems, improving a naive baseline by 7.32% on average in estimating the mean performance and 15.81% in estimating the uncertainty.
翻译:算法选择是为现实世界时间序列分类应用场景设计人工智能服务的关键步骤。传统方法如神经架构搜索、自动化机器学习、组合算法选择及超参数优化虽然有效,但需要大量计算资源,且必须访问全部数据点以执行优化过程。本研究提出一种新颖的数据指纹技术,能以隐私保护的方式描述任意时间序列分类数据集,并在无需对(未见)数据集进行训练的前提下为算法选择问题提供洞见。通过分解多目标回归问题,本方法仅使用数据指纹即可实现可扩展、自适应的算法性能与不确定性估计。我们在112个加州大学河滨分校基准数据集上评估了该方法的有效性,结果表明其能准确预测35种前沿算法的性能,并为时间序列分类服务系统中的高效算法选择提供重要参考——在平均性能估计方面较朴素基线提升7.32%,在不确定性估计方面提升15.81%。