This paper investigates the effectiveness of large language models (LLMs) in email spam detection by comparing prominent models from three distinct families: BERT-like, Sentence Transformers, and Seq2Seq. Additionally, we examine well-established machine learning techniques for spam detection, such as Na\"ive Bayes and LightGBM, as baseline methods. We assess the performance of these models across four public datasets, utilizing different numbers of training samples (full training set and few-shot settings). Our findings reveal that, in the majority of cases, LLMs surpass the performance of the popular baseline techniques, particularly in few-shot scenarios. This adaptability renders LLMs uniquely suited to spam detection tasks, where labeled samples are limited in number and models require frequent updates. Additionally, we introduce Spam-T5, a Flan-T5 model that has been specifically adapted and fine-tuned for the purpose of detecting email spam. Our results demonstrate that Spam-T5 surpasses baseline models and other LLMs in the majority of scenarios, particularly when there are a limited number of training samples available. Our code is publicly available at https://github.com/jpmorganchase/emailspamdetection.
翻译:本文研究了大型语言模型(LLMs)在电子邮件垃圾检测中的有效性,通过对比来自三个不同家族的典型模型:类BERT模型、句子Transformer模型和Seq2Seq模型。此外,我们考察了垃圾检测领域成熟的机器学习技术,如朴素贝叶斯和LightGBM,作为基线方法。我们利用不同数量的训练样本(完整训练集与小样本设置)在四个公开数据集上评估了这些模型的性能。研究结果表明,在大多数情况下,LLMs的性能优于流行的基线技术,尤其是在小样本场景下。这种适应性使得LLMs特别适合标注样本数量有限且模型需要频繁更新的垃圾检测任务。此外,我们提出了Spam-T5,这是一个经过专门适配和微调用于检测电子邮件垃圾的Flan-T5模型。我们的结果表明,在大多数场景下,尤其是训练样本数量有限时,Spam-T5在性能上超越了基线模型及其他LLMs。我们的代码已在https://github.com/jpmorganchase/emailspamdetection 公开。