AI-based digital twins are at the leading edge of the Industry 4.0 revolution, which are technologically empowered by the Internet of Things and real-time data analysis. Information collected from industrial assets is produced in a continuous fashion, yielding data streams that must be processed under stringent timing constraints. Such data streams are usually subject to non-stationary phenomena, causing that the data distribution of the streams may change, and thus the knowledge captured by models used for data analysis may become obsolete (leading to the so-called concept drift effect). The early detection of the change (drift) is crucial for updating the model's knowledge, which is challenging especially in scenarios where the ground truth associated to the stream data is not readily available. Among many other techniques, the estimation of the model's confidence has been timidly suggested in a few studies as a criterion for detecting drifts in unsupervised settings. The goal of this manuscript is to confirm and expose solidly the connection between the model's confidence in its output and the presence of a concept drift, showcasing it experimentally and advocating for a major consideration of uncertainty estimation in comparative studies to be reported in the future.
翻译:基于人工智能的数字孪生是工业4.0革命的前沿技术,其技术基础由物联网和实时数据分析支撑。从工业资产中采集的信息以连续方式生成,形成需在严格时序约束下处理的数据流。此类数据流通常受非平稳现象影响,导致流数据的分布可能发生变化,进而使数据分析模型所捕获的知识变得过时(即产生所谓的“概念漂移”效应)。早期检测变化(漂移)对于更新模型知识至关重要,尤其在流数据关联的真实标签难以获取的场景中具有挑战性。在诸多技术中,少数研究曾试探性提出将模型置信度估计作为无监督环境下漂移检测的准则。本文旨在通过实验证实并牢固阐明模型输出置信度与概念漂移存在之间的关联,提倡在未来的比较研究中将不确定性估计作为重要考量因素。