There is a constant trade-off between the utility of the data collected and processed by the many systems forming the Internet of Things (IoT) revolution and the privacy concerns of the users living in the spaces hosting these sensors. Privacy models, such as the SITA (Spatial, Identity, Temporal, and Activity) model, can help address this trade-off. In this paper, we focus on the problem of $CO_2$ prediction, which is crucial for health monitoring but can be used to monitor occupancy, which might reveal some private information. We apply a number of transformations on a real dataset from a Smart Building to simulate different SITA configurations on the collected data. We use the transformed data with multiple Machine Learning (ML) techniques to analyse the performance of the models to predict $CO_{2}$ levels. Our results show that, for different algorithms, different SITA configurations do not make one algorithm perform better or worse than others, compared to the baseline data; also, in our experiments, the temporal dimension was particularly sensitive, with scores decreasing up to $18.9\%$ between the original and the transformed data. The results can be useful to show the effect of different levels of data privacy on the data utility of IoT applications, and can also help to identify which parameters are more relevant for those systems so that higher privacy settings can be adopted while data utility is still preserved.
翻译:在构成物联网(IoT)革命的众多系统所收集和处理的数据的实用性,与居住在这些传感器所在空间的用户的隐私关切之间,始终存在一种权衡。隐私模型,例如SITA(空间、身份、时间和活动)模型,可以帮助应对这种权衡。在本文中,我们聚焦于CO₂预测问题,这对于健康监测至关重要,但也可用于监测占用情况,这可能会泄露一些私人信息。我们对来自智能建筑的真实数据集应用了一系列变换,以模拟所收集数据上的不同SITA配置。我们将变换后的数据与多种机器学习(ML)技术结合使用,分析模型预测CO₂水平的性能。我们的结果显示,对于不同的算法,与基线数据相比,不同的SITA配置并未使某一算法的性能优于或差于其他算法;此外,在我们的实验中,时间维度尤为敏感,原始数据与变换后数据之间的评分下降高达18.9%。这些结果可用于展示不同数据隐私级别对物联网应用数据效用的影响,并有助于识别哪些参数对这些系统更为重要,从而在仍保持数据效用的前提下,采用更高的隐私设置。