Predictive queries over spatiotemporal (ST) stream data pose significant data processing and analysis challenges. ST data streams involve a set of time series whose data distributions may vary in space and time, exhibiting multiple distinct patterns. In this context, assuming a single machine learning model would adequately handle such variations is likely to lead to failure. To address this challenge, we propose StreamEnsemble, a novel approach to predictive queries over ST data that dynamically selects and allocates Machine Learning models according to the underlying time series distributions and model characteristics. Our experimental evaluation reveals that this method markedly outperforms traditional ensemble methods and single model approaches in terms of accuracy and time, demonstrating a significant reduction in prediction error of more than 10 times compared to traditional approaches.
翻译:时空(ST)流数据上的预测性查询带来了显著的数据处理与分析挑战。ST数据流涉及一组时间序列,其数据分布可能在空间和时间上发生变化,呈现出多种不同的模式。在此背景下,假设单一机器学习模型能够充分处理此类变化很可能导致失败。为应对这一挑战,我们提出了StreamEnsemble,这是一种用于ST数据预测性查询的新方法,它根据底层时间序列分布和模型特性动态选择并分配机器学习模型。我们的实验评估表明,该方法在准确性和时间方面显著优于传统集成方法和单一模型方法,与传统方法相比,预测误差显著降低了10倍以上。