The compact muon solenoid (CMS) experiment is a general-purpose detector for high-energy collision at the large hadron collider (LHC) at CERN. It employs an online data quality monitoring (DQM) system to promptly spot and diagnose particle data acquisition problems to avoid data quality loss. In this study, we present semi-supervised spatio-temporal anomaly detection (AD) monitoring for the physics particle reading channels of the hadronic calorimeter (HCAL) of the CMS using three-dimensional digi-occupancy map data of the DQM. We propose the GraphSTAD system, which employs convolutional and graph neural networks to learn local spatial characteristics induced by particles traversing the detector, and global behavior owing to shared backend circuit connections and housing boxes of the channels, respectively. Recurrent neural networks capture the temporal evolution of the extracted spatial features. We have validated the accuracy of the proposed AD system in capturing diverse channel fault types using the LHC Run-2 collision data sets. The GraphSTAD system has achieved production-level accuracy and is being integrated into the CMS core production system--for real-time monitoring of the HCAL. We have also provided a quantitative performance comparison with alternative benchmark models to demonstrate the promising leverage of the presented system.
翻译:紧凑型缪子螺线管(CMS)实验是欧洲核子研究中心(CERN)大型强子对撞机(LHC)上用于高能对撞的通用探测器。它采用在线数据质量监控(DQM)系统,以快速发现并诊断粒子数据采集问题,从而避免数据质量损失。在本研究中,我们基于DQM的三维数字占用图数据,针对CMS强子量能器(HCAL)的物理粒子读取通道,提出了一种半监督时空异常检测(AD)监控方案。我们设计了GraphSTAD系统,该系统利用卷积神经网络和图神经网络分别学习粒子穿过探测器时产生的局部空间特征,以及通道间共享后端电路连接和封装盒所导致的全局行为。循环神经网络则捕获所提取空间特征的时间演化规律。我们利用LHC Run-2对撞数据集验证了所提出的异常检测系统在捕获多种通道故障类型时的准确性。GraphSTAD系统已实现生产级精度,并正被集成至CMS核心生产系统中,用于对HCAL进行实时监控。我们还提供了与替代基准模型的定量性能对比,以展示该系统具有广阔的应用前景。