Email continues to be a pivotal and extensively utilized communication medium within professional and commercial domains. Nonetheless, the prevalence of spam emails poses a significant challenge for users, disrupting their daily routines and diminishing productivity. Consequently, accurately identifying and filtering spam based on content has become crucial for cybersecurity. Recent advancements in natural language processing, particularly with large language models like ChatGPT, have shown remarkable performance in tasks such as question answering and text generation. However, its potential in spam identification remains underexplored. To fill in the gap, this study attempts to evaluate ChatGPT's capabilities for spam identification in both English and Chinese email datasets. We employ ChatGPT for spam email detection using in-context learning, which requires a prompt instruction and a few demonstrations. We also investigate how the number of demonstrations in the prompt affects the performance of ChatGPT. For comparison, we also implement five popular benchmark methods, including naive Bayes, support vector machines (SVM), logistic regression (LR), feedforward dense neural networks (DNN), and BERT classifiers. Through extensive experiments, the performance of ChatGPT is significantly worse than deep supervised learning methods in the large English dataset, while it presents superior performance on the low-resourced Chinese dataset.
翻译:电子邮件在专业和商业领域仍然是至关重要且广泛使用的通信媒介。然而,垃圾邮件的普遍存在给用户带来了重大挑战,扰乱了他们的日常安排并降低了工作效率。因此,基于内容准确识别和过滤垃圾邮件对于网络安全变得至关重要。自然语言处理领域的最新进展,特别是像ChatGPT这样的大型语言模型,在问答和文本生成等任务中表现出了卓越的性能。然而,其在垃圾邮件识别方面的潜力仍未得到充分探索。为填补这一空白,本研究尝试评估ChatGPT在英文和中文电子邮件数据集上进行垃圾邮件识别的能力。我们采用上下文学习方法,利用ChatGPT进行垃圾邮件检测,这需要一条提示指令和少量示例演示。我们还研究了提示中示例演示的数量如何影响ChatGPT的性能。为进行比较,我们还实现了五种流行的基准方法,包括朴素贝叶斯、支持向量机、逻辑回归、前馈密集神经网络以及BERT分类器。通过大量实验发现,在大型英文数据集上,ChatGPT的性能显著逊于深度监督学习方法,而在资源有限的中文数据集上则表现出更优的性能。