This paper investigates the effectiveness of large language models (LLMs) in email spam detection by comparing prominent models from three distinct families: BERT-like, Sentence Transformers, and Seq2Seq. Additionally, we examine well-established machine learning techniques for spam detection, such as Na\"ive Bayes and LightGBM, as baseline methods. We assess the performance of these models across four public datasets, utilizing different numbers of training samples (full training set and few-shot settings). Our findings reveal that, in the majority of cases, LLMs surpass the performance of the popular baseline techniques, particularly in few-shot scenarios. This adaptability renders LLMs uniquely suited to spam detection tasks, where labeled samples are limited in number and models require frequent updates. Additionally, we introduce Spam-T5, a Flan-T5 model that has been specifically adapted and fine-tuned for the purpose of detecting email spam. Our results demonstrate that Spam-T5 surpasses baseline models and other LLMs in the majority of scenarios, particularly when there are a limited number of training samples available. Our code is publicly available at https://github.com/jpmorganchase/emailspamdetection.
翻译:本文通过比较来自三个不同家族的突出模型(BERT-like、Sentence Transformers 和 Seq2Seq),研究了大语言模型(LLMs)在邮件垃圾检测中的有效性。此外,我们考察了诸如朴素贝叶斯和LightGBM等成熟的机器学习垃圾检测技术作为基线方法。我们利用不同数量的训练样本(完整训练集和少样本设置)在四个公开数据集上评估了这些模型的性能。研究结果表明,在大多数情况下,LLMs超越了流行的基线技术,尤其在少样本场景中。这种适应性使LLMs特别适合垃圾检测任务,因为该任务中标注样本数量有限且模型需要频繁更新。此外,我们引入了Spam-T5,一个专门针对邮件垃圾检测进行适配和微调的Flan-T5模型。我们的结果表明,Spam-T5在大多数场景下超越了基线模型和其他LLMs,特别是在训练样本数量有限的情况下。我们的代码已在 https://github.com/jpmorganchase/emailspamdetection 公开提供。