AI-in-the-Loop: Privacy Preserving Real-Time Scam Detection and Conversational Scambaiting by Leveraging LLMs and Federated Learning

Scams exploiting real-time social engineering -- such as phishing, impersonation, and phone fraud -- remain a persistent and evolving threat across digital platforms. Existing defenses are largely reactive, offering limited protection during active interactions. We propose a privacy-preserving, AI-in-the-loop framework that proactively detects and disrupts scam conversations in real time. The system combines instruction-tuned artificial intelligence with a safety-aware utility function that balances engagement with harm minimization, and employs federated learning to enable continual model updates without raw data sharing. Experimental evaluations show that the system produces fluent and engaging responses (perplexity as low as 22.3, engagement $\approx$0.80), while human studies confirm significant gains in realism, safety, and effectiveness over strong baselines. In federated settings, models trained with FedAvg sustain up to 30 rounds while preserving high engagement ($\approx$0.80), strong relevance ($\approx$0.74), and low PII leakage ($\leq$0.0085). Even with differential privacy, novelty and safety remain stable, indicating that robust privacy can be achieved without sacrificing performance. The evaluation of guard models (LlamaGuard, LlamaGuard2/3, MD-Judge) shows a straightforward pattern: stricter moderation settings reduce the chance of exposing personal information, but they also limit how much the model engages in conversation. In contrast, more relaxed settings allow longer and richer interactions, which improve scam detection, but at the cost of higher privacy risk. To our knowledge, this is the first framework to unify real-time scam-baiting, federated privacy preservation, and calibrated safety moderation into a proactive defense paradigm.

翻译：利用实时社会工程学手段的诈骗行为——如钓鱼攻击、身份冒充和电话诈骗——在数字平台上持续存在且不断演变，构成持久威胁。现有防御机制大多是被动的，在主动交互过程中提供的保护有限。本文提出一种隐私保护的“AI-in-the-loop”框架，能够实时主动检测并中断诈骗对话。该系统结合指令调优人工智能与具备安全意识的效用函数，在保持对话参与度的同时最小化潜在危害，并采用联邦学习实现无需原始数据共享的持续模型更新。实验评估表明，该系统能生成流畅且具吸引力的响应（困惑度低至22.3，参与度$\approx$0.80），而人工研究证实其在真实性、安全性和有效性方面均显著优于强基线模型。在联邦学习设置中，采用FedAvg训练的模型可持续30轮训练周期，同时保持高参与度（$\approx$0.80）、强相关性（$\approx$0.74）和低个人身份信息泄露率（$\leq$0.0085）。即使引入差分隐私机制，生成新颖性与安全性仍保持稳定，表明可在不牺牲性能的前提下实现强隐私保护。对防护模型（LlamaGuard、LlamaGuard2/3、MD-Judge）的评估呈现明确规律：更严格的审核设置虽能降低个人信息暴露概率，但会限制模型对话参与度；反之，更宽松的设置允许更长久丰富的交互从而提升诈骗检测能力，但需承担更高的隐私风险。据我们所知，这是首个将实时诈骗诱骗、联邦隐私保护与可调节安全审核统一于主动防御范式的框架。

相关内容

关注 7104

人工智能杂志AI(Artificial Intelligence)是目前公认的发表该领域最新研究成果的主要国际论坛。该期刊欢迎有关AI广泛方面的论文，这些论文构成了整个领域的进步，也欢迎介绍人工智能应用的论文，但重点应该放在新的和新颖的人工智能方法如何提高应用领域的性能，而不是介绍传统人工智能方法的另一个应用。关于应用的论文应该描述一个原则性的解决方案，强调其新颖性，并对正在开发的人工智能技术进行深入的评估。官网地址：http://dblp.uni-trier.de/db/journals/ai/

保护网络物理系统中的 AI 智能体：关于环境交互、深度伪造威胁及其防御技术的综述

专知会员服务

10+阅读 · 2月15日

DGP双粒度提示框架：图增强大模型助力欺诈检测

专知会员服务

9+阅读 · 2025年8月17日

《利用 LLM 进行高级持续性威胁 (APT) 检测和智能解释》

专知会员服务

23+阅读 · 2025年2月14日

【新书】利用生成式人工智能进行网络防御策略

专知会员服务

31+阅读 · 2024年10月18日