An Energy-Efficient Ensemble Approach for Mitigating Data Incompleteness in IoT Applications

from arxiv, 8 pages, 8 figures, 1 table, Accepted as a conference paper at IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING IN SMART SYSTEMS AND THE INTERNET OF THINGS (DCOSS-IoT 2024)

Machine Learning (ML) is becoming increasingly important for IoT-based applications. However, the dynamic and ad-hoc nature of many IoT ecosystems poses unique challenges to the efficacy of ML algorithms. One such challenge is data incompleteness, which is manifested as missing sensor readings. Many factors, including sensor failures and/or network disruption, can cause data incompleteness. Furthermore, most IoT systems are severely power-constrained. It is important that we build IoT-based ML systems that are robust against data incompleteness while simultaneously being energy efficient. This paper presents an empirical study of SECOE - a recent technique for alleviating data incompleteness in IoT - with respect to its energy bottlenecks. Towards addressing the energy bottlenecks of SECOE, we propose ENAMLE - a proactive, energy-aware technique for mitigating the impact of concurrent missing data. ENAMLE is unique in the sense that it builds an energy-aware ensemble of sub-models, each trained with a subset of sensors chosen carefully based on their correlations. Furthermore, at inference time, ENAMLE adaptively alters the number of the ensemble of models based on the amount of missing data rate and the energy-accuracy trade-off. ENAMLE's design includes several novel mechanisms for minimizing energy consumption while maintaining accuracy. We present extensive experimental studies on two distinct datasets that demonstrate the energy efficiency of ENAMLE and its ability to alleviate sensor failures.

翻译：机器学习（ML）在基于物联网的应用中日益重要。然而，许多物联网生态系统所具有的动态性和临时性特征给机器学习算法的有效性带来了独特挑战。其中之一便是数据不完整性，具体表现为传感器读数缺失。传感器故障和/或网络中断等多种因素均可能导致数据不完整。此外，大多数物联网系统面临严重的能源约束问题。因此，构建既能抵御数据不完整性又具备能效的物联网机器学习系统至关重要。本文针对SECOE——一种缓解物联网数据不完整性的最新技术——开展了关于其能源瓶颈的实证研究。为解决SECOE的能源瓶颈，我们提出了ENAMLE——一种主动式、能源感知的技术，用于缓解并发缺失数据的影响。ENAMLE的独特之处在于构建了一个能源感知的子模型集成，每个子模型均采用基于相关性精心挑选的传感器子集进行训练。此外，在推理阶段，ENAMLE会根据缺失数据率和能量-精度权衡自适应地调整集成模型的数量。ENAMLE的设计包含多种创新机制，旨在最小化能耗的同时保持精度。我们基于两个不同数据集开展了广泛的实验研究，实验结果证明了ENAMLE的能效及其缓解传感器故障的能力。