SMS-based phishing (smishing) attacks have surged, yet training effective on-device detectors requires labelled threat data that quickly becomes outdated. To deal with this issue, we present Agentic Knowledge Distillation, which consists of a powerful LLM acts as an autonomous teacher that fine-tunes a smaller student SLM, deployable for security tasks without human intervention. The teacher LLM autonomously generates synthetic data and iteratively refines a smaller on-device student model until performance plateaus. We compare four LLMs in this teacher role (Claude Opus 4.5, GPT 5.2 Codex, Gemini 3 Pro, and DeepSeek V3.2) on SMS spam/smishing detection with two student SLMs (Qwen2.5-0.5B and SmolLM2-135M). Our results show that performance varies substantially depending on the teacher LLM, with the best configuration achieving 94.31% accuracy and 96.25% recall. We also compare against a Direct Preference Optimisation (DPO) baseline that uses the same synthetic knowledge and LoRA setup but without iterative feedback or targeted refinement; agentic knowledge distillation substantially outperforms it (e.g. 86-94% vs 50-80% accuracy), showing that closed-loop feedback and targeted refinement are critical. These findings demonstrate that agentic knowledge distillation can rapidly yield effective security classifiers for edge deployment, but outcomes depend strongly on which teacher LLM is used.
翻译:基于短信的网络钓鱼(短信钓鱼)攻击激增,但训练有效的设备端检测器需要标注的威胁数据,而这些数据会迅速过时。为解决这一问题,我们提出自主知识蒸馏方法,该方法利用一个强大的大型语言模型作为自主教师,对可部署于安全任务的小型学生模型进行微调,整个过程无需人工干预。教师大型语言模型自主生成合成数据,并迭代优化一个更小的设备端学生模型,直至性能达到稳定。我们比较了四种担任教师角色的大型语言模型(Claude Opus 4.5、GPT 5.2 Codex、Gemini 3 Pro 和 DeepSeek V3.2)在短信垃圾/钓鱼检测任务中的表现,并采用两种学生模型(Qwen2.5-0.5B 和 SmolLM2-135M)。结果表明,性能因教师大型语言模型的不同而存在显著差异,最佳配置达到了 94.31% 的准确率和 96.25% 的召回率。我们还与采用相同合成知识和 LoRA 设置但无迭代反馈或定向优化的直接偏好优化基线进行了对比;自主知识蒸馏方法显著优于该基线(例如准确率从 86-94% 对比 50-80%),证明闭环反馈和定向优化至关重要。这些发现表明,自主知识蒸馏能够快速生成适用于边缘部署的有效安全分类器,但其效果在很大程度上取决于所使用的教师大型语言模型。