Process curves are multivariate finite time series data coming from manufacturing processes. This paper studies machine learning that detect drifts in process curve datasets. A theoretic framework to synthetically generate process curves in a controlled way is introduced in order to benchmark machine learning algorithms for process drift detection. An evaluation score, called the temporal area under the curve, is introduced, which allows to quantify how well machine learning models unveil curves belonging to drift segments. Finally, a benchmark study comparing popular machine learning approaches on synthetic data generated with the introduced framework is presented that shows that existing algorithms often struggle with datasets containing multiple drift segments.
翻译:过程曲线是来自制造过程的多变量有限时间序列数据。本文研究检测过程曲线数据集中漂移的机器学习方法。为评估过程漂移检测的机器学习算法性能,本文引入了一种受控合成生成过程曲线的理论框架。同时提出了一种称为时序曲线下面积的评估指标,用于量化机器学习模型揭示属于漂移段曲线的能力。最后,通过使用所提框架生成的合成数据,对主流机器学习方法进行基准研究,结果表明现有算法在处理包含多个漂移段的数据集时往往表现不佳。