We introduce SigNova, a new semi-supervised framework for detecting anomalies in streamed data. While our initial examples focus on detecting radio-frequency interference (RFI) in digitized signals within the field of radio astronomy, it is important to note that SigNova's applicability extends to any type of streamed data. The framework comprises three primary components. Firstly, we use the signature transform to extract a canonical collection of summary statistics from observational sequences. This allows us to represent variable-length visibility samples as finite-dimensional feature vectors. Secondly, each feature vector is assigned a novelty score, calculated as the Mahalanobis distance to its nearest neighbor in an RFI-free training set. By thresholding these scores we identify observation ranges that deviate from the expected behavior of RFI-free visibility samples without relying on stringent distributional assumptions. Thirdly, we integrate this anomaly detector with Pysegments, a segmentation algorithm, to localize consecutive observations contaminated with RFI, if any. This approach provides a compelling alternative to classical windowing techniques commonly used for RFI detection. Importantly, the complexity of our algorithm depends on the RFI pattern rather than on the size of the observation window. We demonstrate how SigNova improves the detection of various types of RFI (e.g., broadband and narrowband) in time-frequency visibility data. We validate our framework on the Murchison Widefield Array (MWA) telescope and simulated data and the Hydrogen Epoch of Reionization Array (HERA).
翻译:我们提出SigNova,一种用于检测流式数据中异常的新型半监督框架。虽然最初的示例聚焦于射电天文学领域中数字化信号中的射频干扰(RFI)检测,但需要强调的是,SigNova的适用性可扩展至任何类型的流式数据。该框架包含三个主要组成部分:首先,我们使用签名变换从观测序列中提取规范化的汇总统计量集合,从而将可变长度的可见度样本表示为有限维特征向量;其次,为每个特征向量分配一个新颖性分数,该分数通过计算其与无RFI训练集中最近邻的马氏距离得到,通过阈值化这些分数,我们无需依赖严格分布假设即可识别偏离无RFI可见度样本预期行为的观测范围;第三,我们将此异常检测器与分割算法Pysegments集成,以定位连续受RFI污染的观测区间(若有)。该方法为经典RFI检测中常用的窗口技术提供了有力替代方案,值得注意的是,算法复杂度取决于RFI模式而非观测窗口大小。我们展示了SigNova如何提升时频可见度数据中多种类型RFI(如宽带和窄带)的检测性能,并在默奇森宽场阵列(MWA)望远镜、模拟数据及氢再电离纪元阵列(HERA)上验证了该框架。