Data-driven soft sensors are extensively used in industrial and chemical processes to predict hard-to-measure process variables whose real value is difficult to track during routine operations. The regression models used by these sensors often require a large number of labeled examples, yet obtaining the label information can be very expensive given the high time and cost required by quality inspections. In this context, active learning methods can be highly beneficial as they can suggest the most informative labels to query. However, most of the active learning strategies proposed for regression focus on the offline setting. In this work, we adapt some of these approaches to the stream-based scenario and show how they can be used to select the most informative data points. We also demonstrate how to use a semi-supervised architecture based on orthogonal autoencoders to learn salient features in a lower dimensional space. The Tennessee Eastman Process is used to compare the predictive performance of the proposed approaches.
翻译:数据驱动软传感器广泛应用于工业和化工过程中,用于预测难以测量的过程变量,这些变量的真实值在常规操作中难以追踪。这类传感器采用的回归模型通常需要大量有标签样本,然而由于质量检测所需的高昂时间与成本,获取标签信息可能非常昂贵。在此背景下,主动学习方法可极具价值,因为它能建议查询最具信息量的标签。然而,目前针对回归问题提出的多数主动学习策略都侧重于离线场景。本研究将其中部分方法适配至基于流的场景,并展示如何利用它们选取最具信息量的数据点。我们还演示了如何采用基于正交自编码器的半监督架构,在低维空间中学习显著特征。最后,通过田纳西-伊斯曼过程对比了所提方法的预测性能。