Latent Diffusion for Internet of Things Attack Data Generation in Intrusion Detection

Intrusion Detection Systems (IDSs) are a key component for protecting Internet of Things (IoT) environments. However, in Machine Learning-based (ML-based) IDSs, performance is often degraded by the strong class imbalance between benign and attack traffic. Although data augmentation has been widely explored to mitigate this issue, existing approaches typically rely on simple oversampling techniques or generative models that struggle to simultaneously achieve high sample fidelity, diversity, and computational efficiency. To address these limitations, we propose the use of a Latent Diffusion Model (LDM) for attack data augmentation in IoT intrusion detection and provide a comprehensive comparison against state-of-the-art baselines. Experiments were conducted on three representative IoT attack types, specifically Distributed Denial-of-Service (DDoS), Mirai, and Man-in-the-Middle, evaluating both downstream IDS performance and intrinsic generative quality using distributional, dependency-based, and diversity metrics. Results show that balancing the training data with LDM-generated samples substantially improves IDS performance, achieving F1-scores of up to 0.99 for DDoS and Mirai attacks and consistently outperforming competing methods. Additionally, quantitative and qualitative analyses demonstrate that LDMs effectively preserve feature dependencies while generating diverse samples and reduce sampling time by approximately 25\% compared to diffusion models operating directly in data space. These findings highlight latent diffusion as an effective and scalable solution for synthetic IoT attack data generation, substantially mitigating the impact of class imbalance in ML-based IDSs for IoT scenarios.

翻译：入侵检测系统（IDS）是保护物联网（IoT）环境的关键组件。然而，在基于机器学习（ML）的IDS中，良性流量与攻击流量之间的严重类别不平衡往往导致性能下降。尽管数据增强技术已被广泛探索以缓解此问题，但现有方法通常依赖于简单的过采样技术或生成模型，这些方法难以同时实现高样本保真度、多样性和计算效率。为应对这些局限性，我们提出使用潜在扩散模型（LDM）进行物联网入侵检测中的攻击数据增强，并与最先进的基线方法进行了全面比较。实验针对三种代表性的物联网攻击类型——具体为分布式拒绝服务（DDoS）、Mirai和中间人攻击——展开，通过分布性、依赖性和多样性指标，评估了下游IDS性能和生成模型的内在生成质量。结果表明，使用LDM生成的样本平衡训练数据可显著提升IDS性能，对DDoS和Mirai攻击的F1分数最高可达0.99，且持续优于其他对比方法。此外，定量与定性分析表明，LDM在生成多样样本的同时能有效保持特征依赖性，并且与直接在数据空间运行的扩散模型相比，采样时间减少了约25%。这些发现凸显了潜在扩散模型作为一种有效且可扩展的合成物联网攻击数据生成方案，能够显著缓解物联网场景下基于机器学习的IDS中类别不平衡的影响。