Modern industrial facilities generate large volumes of raw sensor data during the production process. This data is used to monitor and control the processes and can be analyzed to detect and predict process abnormalities. Typically, the data has to be annotated by experts in order to be used in predictive modeling. However, manual annotation of large amounts of data can be difficult in industrial settings. In this paper, we propose SensorSCAN, a novel method for unsupervised fault detection and diagnosis, designed for industrial chemical process monitoring. We demonstrate our model's performance on two publicly available datasets of the Tennessee Eastman Process with various faults. The results show that our method significantly outperforms existing approaches (+0.2-0.3 TPR for a fixed FPR) and effectively detects most of the process faults without expert annotation. Moreover, we show that the model fine-tuned on a small fraction of labeled data nearly reaches the performance of a SOTA model trained on the full dataset. We also demonstrate that our method is suitable for real-world applications where the number of faults is not known in advance. The code is available at https://github.com/AIRI-Institute/sensorscan.
翻译:现代工业设施在生产过程中会产生大量原始传感器数据。这些数据既用于过程监控与控制,也可通过分析检测和预测过程异常。通常,这些数据需由专家标注才能用于预测建模,但在工业场景中对海量数据进行人工标注存在困难。本文提出SensorSCAN——一种面向工业化工过程监控的无监督故障检测与诊断新方法。我们在两个公开的田纳西-伊士曼过程数据集上验证了模型性能,这些数据集包含多种故障类型。结果表明,本方法在固定假阳性率下真阳性率提升0.2-0.3,显著优于现有方法,且无需专家标注即可有效检测绝大多数过程故障。此外,使用少量标注数据微调后的模型性能几乎达到在全数据集上训练的当前最优模型水平。我们还证明了该方法适用于故障数量未知的真实工业场景。代码开源地址:https://github.com/AIRI-Institute/sensorscan。