Machine learning can analyze vast amounts of data generated by IoT devices to identify patterns, make predictions, and enable real-time decision-making. By processing sensor data, machine learning models can optimize processes, improve efficiency, and enhance personalized user experiences in smart systems. However, IoT systems are often deployed in sensitive environments such as households and offices, where they may inadvertently expose identifiable information, including location, habits, and personal identifiers. This raises significant privacy concerns, necessitating the application of data minimization -- a foundational principle in emerging data regulations, which mandates that service providers only collect data that is directly relevant and necessary for a specified purpose. Despite its importance, data minimization lacks a precise technical definition in the context of sensor data, where collections of weak signals make it challenging to apply a binary "relevant and necessary" rule. This paper provides a technical interpretation of data minimization in the context of sensor streams, explores practical methods for implementation, and addresses the challenges involved. Through our approach, we demonstrate that our framework can reduce user identifiability by up to 16.7% while maintaining accuracy loss below 1%, offering a viable path toward privacy-preserving IoT data processing.
翻译:机器学习能够分析物联网设备生成的海量数据,从而识别模式、进行预测并实现实时决策。通过处理传感器数据,机器学习模型可以优化流程、提高效率,并增强智能系统中的个性化用户体验。然而,物联网系统通常部署在家庭和办公室等敏感环境中,可能无意中暴露包括位置、习惯和个人标识符在内的可识别信息。这引发了严重的隐私担忧,因此需要应用数据最小化原则——这是新兴数据法规中的一项基本原则,要求服务提供商仅收集与特定目的直接相关且必要的数据。尽管数据最小化至关重要,但在传感器数据背景下,由于弱信号集合使得难以应用二元化的“相关且必要”规则,该原则缺乏精确的技术定义。本文在传感器数据流的背景下提供了数据最小化的技术解释,探讨了实际实施方法,并解决了相关挑战。通过我们的方法,我们证明该框架能够将用户可识别性降低高达16.7%,同时将准确率损失控制在1%以下,为隐私保护的物联网数据处理提供了一条可行路径。