Conversational recommender systems have emerged as a potent solution for efficiently eliciting user preferences. These systems interactively present queries associated with "key terms" to users and leverage user feedback to estimate user preferences more efficiently. Nonetheless, most existing algorithms adopt a centralized approach. In this paper, we introduce FedConPE, a phase elimination-based federated conversational bandit algorithm, where $M$ agents collaboratively solve a global contextual linear bandit problem with the help of a central server while ensuring secure data management. To effectively coordinate all the clients and aggregate their collected data, FedConPE uses an adaptive approach to construct key terms that minimize uncertainty across all dimensions in the feature space. Furthermore, compared with existing federated linear bandit algorithms, FedConPE offers improved computational and communication efficiency as well as enhanced privacy protections. Our theoretical analysis shows that FedConPE is minimax near-optimal in terms of cumulative regret. We also establish upper bounds for communication costs and conversation frequency. Comprehensive evaluations demonstrate that FedConPE outperforms existing conversational bandit algorithms while using fewer conversations.
翻译:对话式推荐系统已成为高效获取用户偏好的有力解决方案。这些系统通过向用户交互式地呈现与"关键词"相关的查询,并利用用户反馈更高效地估计用户偏好。然而,现有算法大多采用集中式方法。本文提出FedConPE——一种基于阶段消除的联邦对话式bandit算法,其中M个智能体在中央服务器协助下协同解决全局上下文线性bandit问题,同时确保数据安全管控。为有效协调所有客户端并聚合其收集的数据,FedConPE采用自适应方法构建关键词,以最小化特征空间所有维度上的不确定性。此外,与现有联邦线性bandit算法相比,FedConPE在计算效率、通信效率及隐私保护方面均有提升。理论分析表明,FedConPE在累积遗憾方面达到极小化近优性。我们还建立了通信成本与对话频率的上界。综合评估证明,FedConPE在使用更少对话次数的同时优于现有对话式bandit算法。