This paper studies how to develop accurate and interpretable time series classification (TSC) models with the help of external data in a privacy-preserving federated learning (FL) scenario. To the best of our knowledge, we are the first to study on this essential topic. Achieving this goal requires us to seamlessly integrate the techniques from multiple fields including Data Mining, Machine Learning, and Security. In this paper, we formulate the problem and identify the interpretability constraints under the FL setting. We systematically investigate existing TSC solutions for the centralized scenario and propose FedST, a novel FL-enabled TSC framework based on a shapelet transformation method. We recognize the federated shapelet search step as the kernel of FedST. Thus, we design FedSS-B, a basic protocol for the FedST kernel that we prove to be secure and accurate. Further, we identify the efficiency bottlenecks of the basic protocol and propose optimizations tailored for the FL setting for acceleration. Our theoretical analysis shows that the proposed optimizations are secure and more efficient. We conduct extensive experiments using both synthetic and real-world datasets. Empirical results show that our FedST solution is effective in terms of TSC accuracy, and the proposed optimizations can achieve three orders of magnitude of speedup.
翻译:本文研究如何在隐私保护的联邦学习场景下,利用外部数据开发准确且可解释的时间序列分类(TSC)模型。据我们所知,这是首个针对这一重要课题的研究。实现该目标需要无缝融合数据挖掘、机器学习与安全等多个领域的技术。本文首先对问题进行形式化定义,并识别联邦学习环境下的可解释性约束。我们系统梳理了集中式场景下现有TSC解决方案,并基于形状变换方法提出了FedST——一种新型联邦学习赋能的TSC框架。我们将联邦形状搜索步骤确立为FedST的核心,为此设计了基础协议FedSS-B,并证明了该协议的安全性与准确性。进一步,我们识别出基础协议的效率瓶颈,并针对联邦学习场景提出加速优化方案。理论分析表明,所提优化方案在保证安全性的同时具有更高效率。通过合成数据集与真实数据集的大量实验,实证结果表明FedST解决方案在TSC准确性方面表现优异,且所提优化方案可实现三个数量级的加速效果。