Model Multiplicity for Adversarial Detection in Small Language Model Training on Edge Devices

The rise of edge-based machine learning has enabled distributed adaptation of language models across mobile and IoT devices, offering privacy preservation and real-time responsiveness. However, distributed fine-tuning of language models on untrusted or heterogeneous edge nodes introduces new vulnerabilities. Compromised or unreliable devices can inject poisoned updates, leading to stealthy model manipulation or convergence degradation. Classical defenses such as robust aggregation or temporal anomaly detection operate on a single global model and are therefore limited in detecting coordinated or persistent poisoning. This work proposes a new system-level defense based on model multiplicity. Instead of maintaining one global model, the system rotates or concurrently trains multiple small language models (e.g., DistilGPT-2), each updated by independently sampled subsets of edge nodes. These models evolve under distinct training trajectories, creating multiple independent views of the same distributed population. Divergence between models quantified through gradient similarity, loss evolution, or parameter variance serves as a signal of anomalous or adversarial behavior. When one model deviates significantly from the ensemble mean, the system flags its contributing nodes for isolation or re-weighting. We implement this framework and evaluate it on edge-scale simulations of Small Language Model (SLM) training under varying heterogeneity and attack conditions. Results show that model multiplicity enables earlier and more reliable detection of poisoning compared to classical single-model defenses such as Flanders and Robust methods. Our findings demonstrate that diversity in model evolution can serve as a practical and effective defense mechanism for secure distributed learning on resource-constrained edge devices.

翻译：基于边缘的机器学习的兴起使得语言模型能够在移动和物联网设备上进行分布式自适应调整，从而提供隐私保护和实时响应能力。然而，在不可信或异构的边缘节点上分布式微调语言模型会引入新的漏洞。受损或不可靠的设备可能注入恶意更新，导致隐蔽的模型操纵或收敛退化。经典防御方法（如鲁棒聚合或时域异常检测）基于单一全局模型运行，因此难以检测协同或持续性中毒攻击。本文提出了一种基于模型多元性的新型系统级防御机制。系统不再维护单一全局模型，而是轮换或并行训练多个小语言模型（例如DistilGPT-2），每个模型由独立采样的边缘节点子集进行更新。这些模型在差异化训练轨迹下演化，形成对同一分布式群体的多个独立视图。通过梯度相似性、损失演化或参数方差量化的模型间差异，可作为异常或对抗行为的信号。当某个模型显著偏离集成均值时，系统会标记其贡献节点进行隔离或权重重新调整。我们实现了该框架，并在异构性和攻击条件变化的小语言模型边缘规模模拟中进行了评估。结果表明，与经典单模型防御方法（如Flanders和鲁棒方法）相比，模型多元性能够更早、更可靠地检测中毒攻击。我们的研究证明，模型演化的多样性可作为资源受限边缘设备上安全分布式学习的实用且有效的防御机制。