Affective Music Recommendation: A Rollout-Based World Model for Offline Preference Optimization

Functional music applications, from consumer focus and sleep aids to clinical interventions, share a distinctive recommendation problem: success is defined by the listener's affective state, but online experimentation on emotion is ethically constrained, particularly for clinical populations who cannot reliably skip a song or report distress. We describe AMRS, the Affective Music Recommendation System deployed on LUCID's health-and-wellness platforms, which serve clinical users (primarily older adults with neurocognitive conditions) and consumer-wellness users across energize, focus, calm, and sleep modes. AMRS is built around a rollout-based world model: a causal transformer trained on logged listening data to jointly predict engagement, binary rating, and self-reported valence and arousal. The world model serves both as an in-silico simulator for offline policy training and as a stress-testing tool before deployment. A recommender policy initialized by behaviour cloning is fine-tuned offline with Direct Preference Optimization (DPO) against a configurable multi-objective utility function. Under a strict cold-start protocol, the world model predicts both behavioural and affective signals with usable fidelity; DPO improves predicted valence and arousal over the cloned baseline while maintaining a similar diversity profile and avoiding the distributional collapse produced by greedy optimization. We position the work as an early deployed validation of a methodology for affective recommendation when online experimentation is ethically untenable.

翻译：功能性音乐应用——从消费者专注辅助与睡眠辅助到临床干预——面临一个独特的推荐问题：成功取决于听者的情感状态，但在线情感实验受到伦理约束，尤其对于无法可靠跳过歌曲或报告不适的临床人群。我们描述了部署在LUCID健康与 wellness平台上的情感音乐推荐系统AMRS，该平台为临床用户（主要为患有神经认知疾病的老年人）和消费者健康用户提供活力、专注、平静和睡眠模式。AMRS基于推演世界模型构建：一个基于因果Transformer的模型，利用记录收听数据进行训练，联合预测参与度、二值评分以及自我报告的情感价态与唤醒度。该世界模型既可作为离线策略训练的计算机模拟器，也可作为部署前的压力测试工具。通过行为克隆初始化的推荐策略，采用直接偏好优化（DPO）针对可配置的多目标效用函数进行离线微调。在严格的冷启动协议下，该世界模型以可用保真度预测行为与情感信号；DPO在保持相似多样性特征并避免贪心优化导致的分布塌缩的同时，相比克隆基线提升了预测的情感价态与唤醒度。本文将该项工作定位为一种在伦理上无法进行在线实验时实施情感推荐方法的早期已部署验证。

相关内容

MoDELS

关注 45

ACM/IEEE第23届模型驱动工程语言和系统国际会议，是模型驱动软件和系统工程的首要会议系列，由ACM-SIGSOFT和IEEE-TCSE支持组织。自1998年以来，模型涵盖了建模的各个方面，从语言和方法到工具和应用程序。模特的参加者来自不同的背景，包括研究人员、学者、工程师和工业专业人士。MODELS 2019是一个论坛，参与者可以围绕建模和模型驱动的软件和系统交流前沿研究成果和创新实践经验。今年的版本将为建模社区提供进一步推进建模基础的机会，并在网络物理系统、嵌入式系统、社会技术系统、云计算、大数据、机器学习、安全、开源等新兴领域提出建模的创新应用以及可持续性。官网链接：http://www.modelsconference.org/

情感推荐系统综述：面向个性化的态度、情绪与情境建模

专知会员服务

17+阅读 · 2025年8月29日

基础模型驱动的推荐系统综述：从特征驱动、生成式到智能体范式

专知会员服务

22+阅读 · 2025年4月24日

【WWW2025】释放大型语言模型在去噪推荐中的强大能力

专知会员服务

13+阅读 · 2025年2月18日

推荐系统中的扩散模型：综述

专知会员服务

21+阅读 · 2025年1月22日