Due to the sweeping digitalization of processes, increasingly vast amounts of time series data are being produced. Accurate classification of such time series facilitates decision making in multiple domains. State-of-the-art classification accuracy is often achieved by ensemble learning where results are synthesized from multiple base models. This characteristic implies that ensemble learning needs substantial computing resources, preventing their use in resource-limited environments, such as in edge devices. To extend the applicability of ensemble learning, we propose the LightTS framework that compresses large ensembles into lightweight models while ensuring competitive accuracy. First, we propose adaptive ensemble distillation that assigns adaptive weights to different base models such that their varying classification capabilities contribute purposefully to the training of the lightweight model. Second, we propose means of identifying Pareto optimal settings w.r.t. model accuracy and model size, thus enabling users with a space budget to select the most accurate lightweight model. We report on experiments using 128 real-world time series sets and different types of base models that justify key decisions in the design of LightTS and provide evidence that LightTS is able to outperform competitors.
翻译:随着流程的全面数字化,产生的时间序列数据量日益庞大。对这些时间序列的精确分类有助于在多个领域进行决策。最先进的分类精度通常通过集成学习实现,该方法将多个基模型的结果进行综合。这一特性意味着集成学习需要大量的计算资源,从而阻碍了其在资源受限环境(如边缘设备)中的应用。为扩展集成学习的适用性,我们提出了LightTS框架,该框架能将大型集成模型压缩为轻量级模型,同时确保具有竞争力的精度。首先,我们提出了自适应集成蒸馏方法,为不同的基模型分配自适应权重,使其差异化的分类能力有针对性地参与轻量级模型的训练。其次,我们提出了识别模型精度与模型规模帕累托最优设置的方法,从而使用户能够在给定空间预算下选择最精确的轻量级模型。我们基于128个真实世界时间序列数据集和不同类型的基模型进行了实验,这些实验验证了LightTS设计中的关键决策,并证明了LightTS能够超越竞争对手。