Background: Cyber-attacks have evolved rapidly in recent years, many individuals and business owners have been affected by cyber-attacks in various ways. Cyber-attacks include various threats such as ransomware, malware, phishing, and Denial of Service (DoS)-related attacks. Challenges: Traditional models such as Generative Artificial Intelligence (AI) and Security Bidirectional Encoder Representations from Transformers (BERT) were implemented to detect cyber threats. However, the existing Security BERT model has a limited contextual understanding of text data, which has less impact on detecting cyber-attacks. Proposed Methodology: To overcome the above-mentioned challenges, Robustly Optimized Bidirectional Encoder Representations from Transformers Pretraining Approach (RoBERTa) model is proposed which consists of diverse words of vocabulary understanding. Initially, data are extracted from a Packet Capture (PCAP) file and encrypted using Fully Harmonic Encryption (FHE). Subsequently, a Byte-level and Byte Pair Encoding (BBPE) tokenizer was used to generate tokens and help maintain the vocabulary for the encrypted values. Then, these values are applied to the RoBERTa model of the transformer with extensive training. Finally, Softmax is used for the detection and classification of attacks. The proposed RoBERTa model achieved better results than the existing BERT model in terms of accuracy (0.99), recall (0.91), and precision (0.89) respectively.
翻译:背景:近年来网络攻击迅速发展,众多个人与商业主体已通过不同方式遭受网络攻击影响。网络攻击涵盖勒索软件、恶意软件、钓鱼攻击及拒绝服务(DoS)攻击等多类威胁。挑战:传统模型如生成式人工智能(AI)与基于Transformer的安全双向编码器表征(BERT)曾被用于检测网络威胁。然而,现有安全BERT模型对文本数据的上下文理解有限,这对网络攻击检测的效能产生制约。方法论:为克服上述挑战,本文提出采用鲁棒优化的Transformer预训练双向编码器表征(RoBERTa)模型,该模型具备多样化的词汇理解能力。首先从数据包捕获(PCAP)文件中提取数据,并采用全同态加密(FHE)进行加密处理。随后使用字节级与字节对编码(BBPE)分词器生成标记,并维护加密值的词汇表。接着将这些数值输入经过充分训练的Transformer架构RoBERTa模型。最终通过Softmax函数实现攻击检测与分类。实验表明,所提出的RoBERTa模型在准确率(0.99)、召回率(0.91)与精确率(0.89)指标上均优于现有BERT模型。