With the proliferation of mobile sensing techniques, huge amounts of time series data are generated and accumulated in various domains, fueling plenty of real-world applications. In this setting, time series anomaly detection is practically important. It endeavors to identify deviant samples from the normal sample distribution in time series. Existing approaches generally assume that all the time series is available at a central location. However, we are witnessing the decentralized collection of time series due to the deployment of various edge devices. To bridge the gap between the decentralized time series data and the centralized anomaly detection algorithms, we propose a Parameter-efficient Federated Anomaly Detection framework named PeFAD with the increasing privacy concerns. PeFAD for the first time employs the pre-trained language model (PLM) as the body of the client's local model, which can benefit from its cross-modality knowledge transfer capability. To reduce the communication overhead and local model adaptation cost, we propose a parameter-efficient federated training module such that clients only need to fine-tune small-scale parameters and transmit them to the server for update. PeFAD utilizes a novel anomaly-driven mask selection strategy to mitigate the impact of neglected anomalies during training. A knowledge distillation operation on a synthetic privacy-preserving dataset that is shared by all the clients is also proposed to address the data heterogeneity issue across clients. We conduct extensive evaluations on four real datasets, where PeFAD outperforms existing state-of-the-art baselines by up to 28.74\%.
翻译:随着移动感知技术的普及,海量时间序列数据在各领域不断生成与积累,推动了众多实际应用的发展。在此背景下,时间序列异常检测具有重要的现实意义。其核心目标是从时间序列的正常样本分布中识别异常样本。现有方法通常假设所有时间序列数据均集中于单一中心节点。然而,随着各类边缘设备的部署,时间序列数据正日益呈现分散化采集的趋势。为弥合分散式时间序列数据与集中式异常检测算法之间的鸿沟,同时响应日益增长的隐私保护需求,本文提出一种参数高效的联邦异常检测框架PeFAD。该框架首次采用预训练语言模型(PLM)作为客户端本地模型的主体,充分利用其跨模态知识迁移能力。为降低通信开销与本地模型适配成本,我们设计了参数高效的联邦训练模块,使得客户端仅需微调小规模参数并上传至服务器进行更新。PeFAD通过创新的异常驱动掩码选择策略,有效缓解训练过程中被忽略的异常样本带来的负面影响。同时,针对客户端间的数据异质性问题,提出基于合成隐私保护数据集的知识蒸馏方法,该数据集可由所有客户端共享。我们在四个真实数据集上进行了广泛评估,实验表明PeFAD相较于现有最优基线方法,性能提升最高可达28.74%。