Functional music applications, from consumer focus and sleep aids to clinical interventions, share a distinctive recommendation problem: success is defined by the listener's affective state, but online experimentation on emotion is ethically constrained, particularly for clinical populations who cannot reliably skip a song or report distress. We describe AMRS, the Affective Music Recommendation System deployed on LUCID's health-and-wellness platforms, which serve clinical users (primarily older adults with neurocognitive conditions) and consumer-wellness users across energize, focus, calm, and sleep modes. AMRS is built around a rollout-based world model: a causal transformer trained on logged listening data to jointly predict engagement, binary rating, and self-reported valence and arousal. The world model serves both as an in-silico simulator for offline policy training and as a stress-testing tool before deployment. A recommender policy initialized by behaviour cloning is fine-tuned offline with Direct Preference Optimization (DPO) against a configurable multi-objective utility function. Under a strict cold-start protocol, the world model predicts both behavioural and affective signals with usable fidelity; DPO improves predicted valence and arousal over the cloned baseline while maintaining a similar diversity profile and avoiding the distributional collapse produced by greedy optimization. We position the work as an early deployed validation of a methodology for affective recommendation when online experimentation is ethically untenable.
翻译:功能性音乐应用——从消费者专注辅助与睡眠辅助到临床干预——面临一个独特的推荐问题:成功取决于听者的情感状态,但在线情感实验受到伦理约束,尤其对于无法可靠跳过歌曲或报告不适的临床人群。我们描述了部署在LUCID健康与 wellness平台上的情感音乐推荐系统AMRS,该平台为临床用户(主要为患有神经认知疾病的老年人)和消费者健康用户提供活力、专注、平静和睡眠模式。AMRS基于推演世界模型构建:一个基于因果Transformer的模型,利用记录收听数据进行训练,联合预测参与度、二值评分以及自我报告的情感价态与唤醒度。该世界模型既可作为离线策略训练的计算机模拟器,也可作为部署前的压力测试工具。通过行为克隆初始化的推荐策略,采用直接偏好优化(DPO)针对可配置的多目标效用函数进行离线微调。在严格的冷启动协议下,该世界模型以可用保真度预测行为与情感信号;DPO在保持相似多样性特征并避免贪心优化导致的分布塌缩的同时,相比克隆基线提升了预测的情感价态与唤醒度。本文将该项工作定位为一种在伦理上无法进行在线实验时实施情感推荐方法的早期已部署验证。