Algorithmic Data Minimization for Machine Learning over Internet-of-Things Data Streams

Machine learning can analyze vast amounts of data generated by IoT devices to identify patterns, make predictions, and enable real-time decision-making. By processing sensor data, machine learning models can optimize processes, improve efficiency, and enhance personalized user experiences in smart systems. However, IoT systems are often deployed in sensitive environments such as households and offices, where they may inadvertently expose identifiable information, including location, habits, and personal identifiers. This raises significant privacy concerns, necessitating the application of data minimization -- a foundational principle in emerging data regulations, which mandates that service providers only collect data that is directly relevant and necessary for a specified purpose. Despite its importance, data minimization lacks a precise technical definition in the context of sensor data, where collections of weak signals make it challenging to apply a binary "relevant and necessary" rule. This paper provides a technical interpretation of data minimization in the context of sensor streams, explores practical methods for implementation, and addresses the challenges involved. Through our approach, we demonstrate that our framework can reduce user identifiability by up to 16.7% while maintaining accuracy loss below 1%, offering a viable path toward privacy-preserving IoT data processing.

翻译：机器学习能够分析物联网设备生成的海量数据，从而识别模式、进行预测并实现实时决策。通过处理传感器数据，机器学习模型可以优化流程、提高效率，并增强智能系统中的个性化用户体验。然而，物联网系统通常部署在家庭和办公室等敏感环境中，可能无意中暴露包括位置、习惯和个人标识符在内的可识别信息。这引发了严重的隐私担忧，因此需要应用数据最小化原则——这是新兴数据法规中的一项基本原则，要求服务提供商仅收集与特定目的直接相关且必要的数据。尽管数据最小化至关重要，但在传感器数据背景下，由于弱信号集合使得难以应用二元化的“相关且必要”规则，该原则缺乏精确的技术定义。本文在传感器数据流的背景下提供了数据最小化的技术解释，探讨了实际实施方法，并解决了相关挑战。通过我们的方法，我们证明该框架能够将用户可识别性降低高达16.7%，同时将准确率损失控制在1%以下，为隐私保护的物联网数据处理提供了一条可行路径。

相关内容

物联网

关注 69

物联网，英文名为Internet of Things，可以简单地理解为物物相连的互联网。物联网主要通过各种设备（比如RFID，传感器，二维码等）的接口将现实世界的物体连接到互联网上，或者使它们互相连接，以实现信息的传递和处理。互联网在现实的物理世界之外新建了一个虚拟世界，物联网将会把两个世界融为一体。

《基于深度学习的软件定义网络模型用于物联网网络威胁检测》

专知会员服务

12+阅读 · 3月16日

《利用边缘计算为物联网系统提供分布式智能》2023最新博士论文

专知会员服务

50+阅读 · 2023年10月30日

能耗优化的神经网络轻量化方法研究进展

专知会员服务

27+阅读 · 2023年1月29日

【MIT博士论文】优化理论与机器学习实践

专知会员服务

95+阅读 · 2022年6月30日