Personalization in machine learning (ML) tailors models' decisions to the individual characteristics of users. While this approach has seen success in areas like recommender systems, its expansion into high-stakes fields such as healthcare and autonomous driving is hindered by the extensive regulatory approval processes involved. To address this challenge, we propose a novel framework termed represented Markov Decision Processes (r-MDPs) that is designed to balance the need for personalization with the regulatory constraints. In an r-MDP, we cater to a diverse user population, each with unique preferences, through interaction with a small set of representative policies. Our objective is twofold: efficiently match each user to an appropriate representative policy and simultaneously optimize these policies to maximize overall social welfare. We develop two deep reinforcement learning algorithms that efficiently solve r-MDPs. These algorithms draw inspiration from the principles of classic K-means clustering and are underpinned by robust theoretical foundations. Our empirical investigations, conducted across a variety of simulated environments, showcase the algorithms' ability to facilitate meaningful personalization even under constrained policy budgets. Furthermore, they demonstrate scalability, efficiently adapting to larger policy budgets.
翻译:个性化机器学习旨在根据用户的个体特征定制模型决策。尽管该方法在推荐系统等领域取得了成功,但在医疗健康和自动驾驶等高风险领域的推广受到严格监管审批流程的制约。为解决这一挑战,我们提出了一种名为表示马尔可夫决策过程(r-MDPs)的新型框架,旨在平衡个性化需求与监管约束。在r-MDP中,我们通过少量具有代表性的策略与具有独特偏好的多样化用户群体进行交互。我们的目标有二:高效地将每个用户匹配到合适的代表性策略,并同时优化这些策略以最大化整体社会福利。我们开发了两种深度强化学习算法来高效求解r-MDPs,这些算法借鉴了经典K-means聚类的思想原理,并具有坚实的理论基础。在多种模拟环境中的实证研究显示,即便在策略预算受限的情况下,该算法仍能实现有意义的个性化。此外,它们还展现出可扩展性,能够高效适应更大的策略预算规模。