Spatio-Temporal Attention Graph Neural Network: Explaining Causalities With Attention

Industrial Control Systems (ICS) underpin critical infrastructure and face growing cyber-physical threats due to the convergence of operational technology and networked environments. While machine learning-based anomaly detection approaches in ICS shows strong theoretical performance, deployment is often limited by poor explainability, high false-positive rates, and sensitivity to evolving system behavior, i.e., baseline drifting. We propose a Spatio-Temporal Attention Graph Neural Network (STA-GNN) for unsupervised and explainable anomaly detection in ICS that models both temporal dynamics and relational structure of the system. Sensors, controllers, and network entities are represented as nodes in a dynamically learned graph, enabling the model to capture inter-dependencies across physical processes and communication patterns. Attention mechanisms provide influential relationships, supporting inspection of correlations and potential causal pathways behind detected events. The approach supports multiple data modalities, including SCADA point measurements, network flow features, and payload features, and thus enables unified cyber-physical analysis. To address operational requirements, we incorporate a conformal prediction strategy to control false alarm rates and monitor performance degradation under drifting of the environment. Our findings highlight the possibilities and limitations of model evaluation and common pitfalls in anomaly detection in ICS. Our findings emphasise the importance of explainable, drift-aware evaluation for reliable deployment of learning-based security monitoring systems.

翻译：工业控制系统（ICS）是支撑关键基础设施的核心，由于运营技术与网络环境的融合，其面临的网络物理威胁日益严峻。尽管基于机器学习的ICS异常检测方法在理论上表现出色，但其部署往往受到可解释性差、误报率高以及对系统行为演变（即基线漂移）敏感性的限制。本文提出一种时空注意力图神经网络（STA-GNN），用于ICS的无监督且可解释的异常检测，该模型同时建模系统的时间动态与关系结构。传感器、控制器及网络实体被表示为动态学习图中的节点，使模型能够捕捉物理过程与通信模式间的相互依赖关系。注意力机制提供关键影响关系，支持对检测事件背后的相关性及潜在因果路径进行审查。该方法支持多种数据模态，包括SCADA点测量值、网络流特征及负载特征，从而实现统一的网络物理分析。为满足运维需求，我们引入共形预测策略以控制误报率，并监测环境漂移下的性能退化。我们的研究结果揭示了模型评估的可能性与局限性，以及ICS异常检测中的常见缺陷。研究结果强调了可解释、漂移感知的评估对于基于学习的安全监控系统可靠部署的重要性。