The rapid expansion of the Internet of Things (IoT) in domains such as smart cities, transportation, and industrial systems has heightened the urgency of addressing their security vulnerabilities. IoT devices often operate under limited computational resources, lack robust physical safeguards, and are deployed in heterogeneous and dynamic networks, making them prime targets for cyberattacks and malware applications. Machine learning (ML) offers a promising approach to automated malware detection and classification, but practical deployment requires models that are both effective and lightweight. The goal of this study is to investigate the effectiveness of four supervised learning models (Random Forest, LightGBM, Logistic Regression, and a Multi-Layer Perceptron) for malware detection and classification using the IoT-23 dataset. We evaluate model performance in both binary and multiclass classification tasks, assess sensitivity to training data volume, and analyze temporal robustness to simulate deployment in evolving threat landscapes. Our results show that tree-based models achieve high accuracy and generalization, even with limited training data, while performance deteriorates over time as malware diversity increases. These findings underscore the importance of adaptive, resource-efficient ML models for securing IoT systems in real-world environments.
翻译:物联网(IoT)在智慧城市、交通和工业系统等领域的快速扩张,使得解决其安全漏洞的紧迫性日益凸显。物联网设备通常在有限的计算资源下运行,缺乏强大的物理防护措施,并且部署在异构且动态的网络中,这使其成为网络攻击和恶意软件应用的主要目标。机器学习为自动化的恶意软件检测与分类提供了一种前景广阔的方法,但实际部署要求模型既高效又轻量。本研究旨在利用IoT-23数据集,评估四种监督学习模型(随机森林、LightGBM、逻辑回归和多层感知机)在恶意软件检测与分类任务中的有效性。我们评估了模型在二分类与多分类任务中的性能,分析了其对训练数据量的敏感性,并检验了其时间鲁棒性以模拟在不断演变的威胁环境中的部署。结果表明,基于树的模型即使在训练数据有限的情况下也能实现较高的准确率和泛化能力,但随着恶意软件多样性的增加,模型性能会随时间推移而下降。这些发现强调了在现实环境中保障物联网系统安全时,采用自适应且资源高效的机器学习模型的重要性。