Email continues to be a pivotal and extensively utilized communication medium within professional and commercial domains. Nonetheless, the prevalence of spam emails poses a significant challenge for users, disrupting their daily routines and diminishing productivity. Consequently, accurately identifying and filtering spam based on content has become crucial for cybersecurity. Recent advancements in natural language processing, particularly with large language models like ChatGPT, have shown remarkable performance in tasks such as question answering and text generation. However, its potential in spam identification remains underexplored. To fill in the gap, this study attempts to evaluate ChatGPT's capabilities for spam identification in both English and Chinese email datasets. We employ ChatGPT for spam email detection using in-context learning, which requires a prompt instruction and a few demonstrations. We also investigate how the training example size affects the performance of ChatGPT. For comparison, we also implement five popular benchmark methods, including naive Bayes, support vector machines (SVM), logistic regression (LR), feedforward dense neural networks (DNN), and BERT classifiers. Though extensive experiments, the performance of ChatGPT is significantly worse than deep supervised learning methods in the large English dataset, while it presents superior performance on the low-resourced Chinese dataset, even outperforming BERT in this case.
翻译:电子邮件依然是专业和商业领域中关键且广泛使用的通信媒介。然而,垃圾邮件的普遍存在给用户带来了巨大挑战,扰乱了他们的日常事务并降低了工作效率。因此,基于内容准确识别和过滤垃圾邮件已成为网络安全的关键。近期自然语言处理的进展,特别是像ChatGPT这样的大型语言模型,在问答和文本生成等任务中表现出色。然而,其在垃圾邮件识别方面的潜力尚未得到充分探索。为填补这一空白,本研究试图评估ChatGPT在英文和中文邮件数据集中识别垃圾邮件的能力。我们利用上下文学习(in-context learning)方法,通过提示指令和少量示例使用ChatGPT进行垃圾邮件检测,并探究训练示例数量对ChatGPT性能的影响。为进行对比,我们还实现了五种流行的基准方法,包括朴素贝叶斯、支持向量机(SVM)、逻辑回归(LR)、前馈密集神经网络(DNN)和BERT分类器。通过大量实验发现,在大型英文数据集中,ChatGPT的性能显著劣于深度监督学习方法;而在资源稀缺的中文数据集中,ChatGPT表现出优越性能,甚至在该场景下超越了BERT。