Expert with Clustering: Hierarchical Online Preference Learning Framework

Emerging mobility systems are increasingly capable of recommending options to mobility users, to guide them towards personalized yet sustainable system outcomes. Even more so than the typical recommendation system, it is crucial to minimize regret, because 1) the mobility options directly affect the lives of the users, and 2) the system sustainability relies on sufficient user participation. In this study, we consider accelerating user preference learning by exploiting a low-dimensional latent space that captures the mobility preferences of users. We introduce a hierarchical contextual bandit framework named Expert with Clustering (EWC), which integrates clustering techniques and prediction with expert advice. EWC efficiently utilizes hierarchical user information and incorporates a novel Loss-guided Distance metric. This metric is instrumental in generating more representative cluster centroids. In a recommendation scenario with $N$ users, $T$ rounds per user, and $K$ options, our algorithm achieves a regret bound of $O(N\sqrt{T\log K} + NT)$. This bound consists of two parts: the first term is the regret from the Hedge algorithm, and the second term depends on the average loss from clustering. To the best of the authors knowledge, this is the first work to analyze the regret of an integrated expert algorithm with k-Means clustering. This regret bound underscores the theoretical and experimental efficacy of EWC, particularly in scenarios that demand rapid learning and adaptation. Experimental results highlight that EWC can substantially reduce regret by 27.57% compared to the LinUCB baseline. Our work offers a data-efficient approach to capturing both individual and collective behaviors, making it highly applicable to contexts with hierarchical structures. We expect the algorithm to be applicable to other settings with layered nuances of user preferences and information.

翻译：新兴的移动出行系统日益具备向用户推荐出行方案的能力，以引导其实现个性化且可持续的系统结果。与典型的推荐系统相比，最小化遗憾值显得尤为关键，原因在于：1）出行选择直接影响用户生活；2）系统可持续性依赖于足够的用户参与。本研究提出通过利用捕捉用户出行偏好的低维潜在空间来加速用户偏好学习。我们引入了一种名为“基于聚类的专家”（EWC）的层次化上下文赌博机框架，该框架整合了聚类技术与专家建议预测。EWC能有效利用层次化用户信息，并引入了一种新颖的损失引导距离度量。该度量方法有助于生成更具代表性的聚类中心。在包含$N$个用户、每个用户$T$轮交互、$K$个选项的推荐场景中，我们的算法实现了$O(N\sqrt{T\log K} + NT)$的遗憾界。该界限由两部分组成：第一项为Hedge算法产生的遗憾，第二项取决于聚类的平均损失。据作者所知，这是首个对集成k-Means聚类的专家算法进行遗憾分析的研究。该遗憾界从理论与实验层面验证了EWC的有效性，尤其在需要快速学习与适应的场景中。实验结果表明，相较于LinUCB基线算法，EWC能显著降低27.57%的遗憾值。本研究提供了一种数据高效的方法来捕捉个体与群体行为，使其在具有层次化结构的场景中具有高度适用性。我们预期该算法可推广至其他具有多层次用户偏好与信息特征的场景。