CELEST: Federated Learning for Globally Coordinated Threat Detection

The cyber-threat landscape has evolved tremendously in recent years, with new threat variants emerging daily, and large-scale coordinated campaigns becoming more prevalent. In this study, we propose CELEST (CollaborativE LEarning for Scalable Threat detection, a federated machine learning framework for global threat detection over HTTP, which is one of the most commonly used protocols for malware dissemination and communication. CELEST leverages federated learning in order to collaboratively train a global model across multiple clients who keep their data locally, thus providing increased privacy and confidentiality assurances. Through a novel active learning component integrated with the federated learning technique, our system continuously discovers and learns the behavior of new, evolving, and globally-coordinated cyber threats. We show that CELEST is able to expose attacks that are largely invisible to individual organizations. For instance, in one challenging attack scenario with data exfiltration malware, the global model achieves a three-fold increase in Precision-Recall AUC compared to the local model. We also design a poisoning detection and mitigation method, DTrust, specifically designed for federated learning in the collaborative threat detection domain. DTrust successfully detects poisoning clients using the feedback from participating clients to investigate and remove them from the training process. We deploy CELEST on two university networks and show that it is able to detect the malicious HTTP communication with high precision and low false positive rates. Furthermore, during its deployment, CELEST detected a set of previously unknown 42 malicious URLs and 20 malicious domains in one day, which were confirmed to be malicious by VirusTotal.

翻译：摘要：近年来，网络威胁格局发生了巨大变化，新型威胁变种层出不穷，大规模协同攻击活动日益普遍。本研究提出CELEST（面向可扩展威胁检测的协作学习框架），这是一个基于联邦学习的机器学习框架，用于通过HTTP协议进行全球威胁检测——HTTP是恶意软件传播与通信中最常用的协议之一。CELEST利用联邦学习技术，在多个客户端间协作训练全局模型，这些客户端保持本地数据存储，从而提供更高的隐私与机密性保障。通过与联邦学习技术相结合的新型主动学习组件，该系统能够持续发现并学习新出现、持续演变且全球协同的网络威胁的行为模式。我们证明，CELEST能够揭露对单个组织而言几乎不可见的攻击。例如，在一个涉及数据窃取恶意软件的复杂攻击场景中，全局模型的精确率-召回率曲线下面积（Precision-Recall AUC）相比本地模型提升了三倍。我们还专门为协同威胁检测领域的联邦学习设计了一种中毒检测与缓解方法——DTrust。该方法通过利用参与客户端的反馈来检测中毒客户端，并对其进行调查后将其从训练过程中移除。我们在两个大学网络上部署了CELEST，结果表明该系统能够以高精确率和低误报率检测恶意HTTP通信。此外，在部署期间，CELEST在一天内检测到了42个先前未知的恶意URL和20个恶意域名，这些结果经VirusTotal验证确认为恶意。