Data-driven soft sensors are extensively used in industrial and chemical processes to predict hard-to-measure process variables whose real value is difficult to track during routine operations. The regression models used by these sensors often require a large number of labeled examples, yet obtaining the label information can be very expensive given the high time and cost required by quality inspections. In this context, active learning methods can be highly beneficial as they can suggest the most informative labels to query. However, most of the active learning strategies proposed for regression focus on the offline setting. In this work, we adapt some of these approaches to the stream-based scenario and show how they can be used to select the most informative data points. We also demonstrate how to use a semi-supervised architecture based on orthogonal autoencoders to learn salient features in a lower dimensional space. The Tennessee Eastman Process is used to compare the predictive performance of the proposed approaches.
翻译:数据驱动的软传感器广泛应用于工业与化工过程,用于预测常规操作中难以实时追踪的、真实值难以获取的过程变量。这些传感器所用的回归模型通常需要大量标注样本,然而由于质量检测耗费高昂的时间与成本,获取标注信息的代价极为昂贵。在此背景下,主动学习方法能够发挥重要作用,通过筛选最具信息量的样本进行标注。然而,现有面向回归任务的主动学习策略多聚焦于离线场景。本研究将部分此类方法适配至流式场景,并展示如何利用它们选择最具信息量的数据点。同时,我们提出基于正交自编码器的半监督架构,用于在低维空间中学习显著特征。通过田纳西-伊斯曼过程案例,对比验证了所提方法的预测性能。