adaPARL: Adaptive Privacy-Aware Reinforcement Learning for Sequential-Decision Making Human-in-the-Loop Systems

Reinforcement learning (RL) presents numerous benefits compared to rule-based approaches in various applications. Privacy concerns have grown with the widespread use of RL trained with privacy-sensitive data in IoT devices, especially for human-in-the-loop systems. On the one hand, RL methods enhance the user experience by trying to adapt to the highly dynamic nature of humans. On the other hand, trained policies can leak the user's private information. Recent attention has been drawn to designing privacy-aware RL algorithms while maintaining an acceptable system utility. A central challenge in designing privacy-aware RL, especially for human-in-the-loop systems, is that humans have intrinsic variability and their preferences and behavior evolve. The effect of one privacy leak mitigation can be different for the same human or across different humans over time. Hence, we can not design one fixed model for privacy-aware RL that fits all. To that end, we propose adaPARL, an adaptive approach for privacy-aware RL, especially for human-in-the-loop IoT systems. adaPARL provides a personalized privacy-utility trade-off depending on human behavior and preference. We validate the proposed adaPARL on two IoT applications, namely (i) Human-in-the-Loop Smart Home and (ii) Human-in-the-Loop Virtual Reality (VR) Smart Classroom. Results obtained on these two applications validate the generality of adaPARL and its ability to provide a personalized privacy-utility trade-off. On average, for the first application, adaPARL improves the utility by $57\%$ over the baseline and by $43\%$ over randomization. adaPARL also reduces the privacy leak by $23\%$ on average. For the second application, adaPARL decreases the privacy leak to $44\%$ before the utility drops by $15\%$.

翻译：强化学习（RL）相较于基于规则的方法在多种应用中展现出显著优势。然而，随着物联网设备中采用隐私敏感数据训练RL的广泛应用（尤其是人机协同系统），隐私问题日益凸显。一方面，RL方法通过适应人类高度动态的特性来提升用户体验；另一方面，训练后的策略可能泄露用户的隐私信息。近年来，如何在维持可接受系统效用的同时设计隐私感知RL算法备受关注。设计隐私感知RL（特别是人机协同系统）的核心挑战在于：人类具有内在变异性，其偏好与行为会随时间演变。同一隐私泄露缓解措施对同一用户或不同用户的影响可能随时间变化。因此，无法设计一种适用于所有场景的固定隐私感知RL模型。为此，我们提出adaPARL——一种面向隐私感知RL的自适应方法，尤其适用于人机协同物联网系统。adaPARL能根据人类行为与偏好提供个性化的隐私-效用权衡。我们在两类物联网应用（即人机协同智能家居与人机协同虚拟现实（VR）智慧教室）中验证了所提方法。这二者上的实验结果证实了adaPARL的泛化性及其提供个性化隐私-效用权衡的能力。平均而言，在首个应用中，adaPARL将效用较基线提升57%，较随机化提升43%，同时将隐私泄露降低23%。在第二个应用中，adaPARL在效用下降15%之前将隐私泄露降至44%。