Many modern systems, such as financial, transportation, and telecommunications systems, are time-sensitive in the sense that they demand low-latency predictions for real-time decision-making. Such systems often have to contend with continuous unbounded data streams as well as concept drift, which are challenging requirements that traditional regression techniques are unable to cater to. There exists a need to create novel data stream regression methods that can handle these scenarios. We present a database-inspired datastream regression model that (a) uses inspiration from R*-trees to create granules from incoming datastreams such that relevant information is retained, (b) iteratively forgets granules whose information is deemed to be outdated, thus maintaining a list of only recent, relevant granules, and (c) uses the recent data and granules to provide low-latency predictions. The R*-tree-inspired approach also makes the algorithm amenable to integration with database systems. Our experiments demonstrate that the ability of this method to discard data produces a significant order-of-magnitude improvement in latency and training time when evaluated against the most accurate state-of-the-art algorithms, while the R*-tree-inspired granulation technique provides competitively accurate predictions
翻译:许多现代系统(如金融、交通和电信系统)对时间敏感,要求低延迟预测以支持实时决策。这类系统往往需要应对持续无界的数据流以及概念漂移,而传统回归技术难以满足这些苛刻需求。因此,亟需开发能够处理此类场景的新型数据流回归方法。我们提出一种受数据库启发的数据流回归模型,该模型:(a) 借鉴R*-树的思想,从输入数据流中创建信息颗粒,以保留相关信息;(b) 迭代式遗忘信息被视为过时的颗粒,从而仅维护近期相关颗粒列表;(c) 利用近期数据和颗粒提供低延迟预测。这种基于R*-树的方法还使算法易于与数据库系统集成。实验表明,相较于最精确的最新算法,该方法丢弃数据的能力在延迟和训练时间上实现了数量级的显著提升,而基于R*-树的粒化技术则提供了具有竞争力的精确预测。