Intrusion detection is a long standing and crucial problem in security. A system capable of detecting intrusions automatically is on great demand in enterprise security solutions. Existing solutions rely heavily on hand-crafted rules designed by security operators, which suffer from high false negative rates and poor generalization ability to new, zero-day attacks at scale. AI and machine learning offer promising solutions to address the issues, by inspecting abnormal user behaviors intelligently and automatically from data. However, existing learning-based intrusion detection systems in the literature are mostly designed for small data, and they lack the ability to leverage the power of big data in cloud environments. In this paper, we target at this problem and introduce an intrusion detection system which incorporates large-scale pre-training, so as to train a large language model based on tens of millions of command lines for AI-based intrusion detection. Experiments performed on 30 million training samples and 10 million test samples verify the effectiveness of our solution.
翻译:入侵检测是安全领域一个长期且至关重要的问题。在企业安全解决方案中,对于能够自动检测入侵的系统有着巨大需求。现有解决方案严重依赖安全操作员手工制定的规则,这类规则存在高漏报率且难以大规模泛化至新型零日攻击的问题。人工智能与机器学习通过从数据中智能自动地检测异常用户行为,为解决这些问题提供了有前景的方案。然而,现有文献中基于学习的入侵检测系统大多针对小规模数据设计,且缺乏利用云环境中大数据能力。本文针对此问题,引入了一种融合大规模预训练的入侵检测系统,该系统的核心是基于数千万条命令行训练一个用于AI入侵检测的大型语言模型。在3000万训练样本和1000万测试样本上进行的实验验证了我们解决方案的有效性。