Can we trust Large Language Models (LLMs) to accurately predict scam? This paper investigates the vulnerabilities of LLMs when facing adversarial scam messages for the task of scam detection. We addressed this issue by creating a comprehensive dataset with fine-grained labels of scam messages, including both original and adversarial scam messages. The dataset extended traditional binary classes for the scam detection task into more nuanced scam types. Our analysis showed how adversarial examples took advantage of vulnerabilities of a LLM, leading to high misclassification rate. We evaluated the performance of LLMs on these adversarial scam messages and proposed strategies to improve their robustness.
翻译:我们能否信任大型语言模型(LLMs)准确预测诈骗行为?本文研究了LLMs在面对对抗性诈骗信息进行诈骗检测任务时的脆弱性。我们通过创建包含细粒度标签的综合性数据集来解决该问题,该数据集同时涵盖原始诈骗信息与对抗性诈骗信息。该数据集将传统诈骗检测任务的二元分类扩展为更具细微差别的诈骗类型。我们的分析揭示了对抗性样本如何利用LLM的脆弱性,导致高错误分类率。我们评估了LLMs在处理对抗性诈骗信息时的性能表现,并提出了提升其鲁棒性的改进策略。