Identifying outlier behavior among sensors and subsystems is essential for discovering faults and facilitating diagnostics in large systems. At the same time, exploring large systems with numerous multivariate data sets is challenging. This study presents a lightweight interconnection and divergence discovery mechanism (LIDD) to identify abnormal behavior in multi-system environments. The approach employs a multivariate analysis technique that first estimates the similarity heatmaps among the sensors for each system and then applies information retrieval algorithms to provide relevant multi-level interconnection and discrepancy details. Our experiment on the readout systems of the Hadron Calorimeter of the Compact Muon Solenoid (CMS) experiment at CERN demonstrates the effectiveness of the proposed method. Our approach clusters readout systems and their sensors consistent with the expected calorimeter interconnection configurations, while capturing unusual behavior in divergent clusters and estimating their root causes.
翻译:识别传感器及子系统间的异常行为对于大型系统的故障发现与诊断至关重要。然而,探索包含大量多元数据集的大型系统极具挑战性。本研究提出一种轻量级互联与差异发现机制(LIDD),用于识别多系统环境中的异常行为。该方法采用多元分析技术,首先估算每个系统中传感器间的相似性热力图,进而应用信息检索算法提供多层次系统互联与差异的相关信息。我们在欧洲核子研究中心(CERN)紧凑μ子螺线管(CMS)实验的强子量能器读出系统上开展的实验验证了该方法的有效性。该方法能够将读出系统及其传感器聚类为与预期量能器互联配置一致的集群,同时捕获差异集群中的异常行为并估计其根本原因。