The following work addresses the problem of frameworks for data stream processing that can be used to evaluate the solutions in an environment that resembles real-world applications. The definition of structured frameworks stems from a need to reliably evaluate the data stream classification methods, considering the constraints of delayed and limited label access. The current experimental evaluation often boundlessly exploits the assumption of their complete and immediate access to monitor the recognition quality and to adapt the methods to the changing concepts. The problem is leveraged by reviewing currently described methods and techniques for data stream processing and verifying their outcomes in simulated environment. The effect of the work is a proposed taxonomy of data stream processing frameworks, showing the linkage between drift detection and classification methods considering a natural phenomenon of label delay.
翻译:本研究针对数据流处理框架的问题,提出了一种可在模拟真实应用环境中评估解决方案的框架体系。结构化框架的定义源于对数据流分类方法进行可靠评估的需求,同时考虑了标签延迟访问与有限访问的约束条件。当前实验评估往往无限制地利用完全即时访问标签的假设来监控识别质量,并使方法适应不断变化的概念。本文通过综述当前描述的数据流处理方法与技术,并在模拟环境中验证其效果,从而解决该问题。本工作的成果是提出了一种数据流处理框架的分类体系,该体系揭示了在考虑标签延迟这一自然现象时,漂移检测与分类方法之间的关联。