Time series forecasting and anomaly detection are common tasks for practitioners in industries such as retail, manufacturing, advertising and energy. Two unique challenges stand out: (1) efficiently and accurately forecasting time series or detecting anomalies in large volumes automatically; and (2) ensuring interpretability of results to effectively incorporate business insights. We present ARIMA_PLUS, a novel framework to overcome these two challenges by a unique combination of (a) accurate and interpretable time series models and (b) scalable and fully managed system infrastructure. The model has a sequential and modular structure to handle different components of the time series, including holiday effects, seasonality, trend, and anomalies, which enables high interpretability of the results. Novel enhancements are made to each module, and a unified framework is established to address both forecasting and anomaly detection tasks simultaneously. In terms of accuracy, its comprehensive benchmark on the 42 public datasets in the Monash forecasting repository shows superior performance over not only well-established statistical alternatives (such as ETS, ARIMA, TBATS, Prophet) but also newer neural network models (such as DeepAR, N-BEATS, PatchTST, TimeMixer). In terms of infrastructure, it is directly built into the query engine of BigQuery in Google Cloud. It uses a simple SQL interface and automates tedious technicalities such as data cleaning and model selection. It automatically scales with managed cloud computational and storage resources, making it possible to forecast 100 million time series using only 1.5 hours with a throughput of more than 18000 time series per second. In terms of interpretability, we present several case studies to demonstrate time series insights it generates and customizability it offers.
翻译:时间序列预测与异常检测是零售、制造、广告和能源等行业从业者的常见任务。其中两个突出挑战在于:(1)高效且准确地自动预测或检测海量时间序列中的异常;(2)确保结果的可解释性,以有效融入业务洞察。本文提出ARIMA_PLUS这一新型框架,通过(a)高精度可解释的时间序列模型与(b)可扩展的全托管系统基础设施的独特结合,以应对这两大挑战。该模型采用顺序化模块化结构处理时间序列的不同成分,包括节假日效应、季节性、趋势及异常值,从而确保结果的高度可解释性。每个模块均进行了创新性改进,并建立统一框架以同时处理预测与异常检测任务。在精度方面,基于Monash预测库中42个公开数据集的综合基准测试表明,其性能不仅优于成熟的统计方法(如ETS、ARIMA、TBATS、Prophet),也超越了新兴神经网络模型(如DeepAR、N-BEATS、PatchTST、TimeMixer)。在基础设施层面,该框架直接集成于谷歌云BigQuery的查询引擎中,通过简易SQL接口实现自动化操作,涵盖数据清洗与模型选择等繁琐技术环节。依托托管的云计算与存储资源,系统可自动扩展性能,仅需1.5小时即可完成1亿条时间序列的预测,吞吐量超过每秒18000条序列。在可解释性方面,我们通过多个案例研究展示了其生成的时间序列洞察与提供的定制化能力。