Music representations are the backbone of modern recommendation systems, powering playlist generation, similarity search, and personalized discovery. Yet most embeddings offer little control for adjusting a single musical attribute, e.g., changing only the mood of a track while preserving its genre or instrumentation. In this work, we address the problem of controllable music retrieval through embedding-based transformation, where the objective is to retrieve songs that remain similar to a seed track but are modified along one chosen dimension. We propose a novel framework for mood-guided music embedding transformation, which learns a mapping from a seed audio embedding to a target embedding guided by mood labels, while preserving other musical attributes. Because mood cannot be directly altered in the seed audio, we introduce a sampling mechanism that retrieves proxy targets to balance diversity with similarity to the seed. We train a lightweight translation model using this sampling strategy and introduce a novel joint objective that encourages transformation and information preservation. Extensive experiments on two datasets show strong mood transformation performance while retaining genre and instrumentation far better than training-free baselines, establishing controllable embedding transformation as a promising paradigm for personalized music retrieval.
翻译:音乐表征是现代推荐系统的核心,支撑着播放列表生成、相似性搜索和个性化发现等功能。然而,大多数嵌入表示在调整单一音乐属性方面几乎无法提供控制能力,例如仅改变曲目的情绪同时保持其流派或乐器配置不变。在本工作中,我们通过基于嵌入的变换来解决可控音乐检索问题,其目标是在保持与种子曲目相似性的基础上,沿一个选定维度对检索到的歌曲进行修改。我们提出了一种新颖的情绪引导音乐嵌入变换框架,该框架学习从种子音频嵌入到目标嵌入的映射,该映射由情绪标签引导,同时保留其他音乐属性。由于无法直接在种子音频中改变情绪,我们引入了一种采样机制来检索代理目标,以平衡多样性与种子相似性。我们使用此采样策略训练了一个轻量级翻译模型,并引入了一种新颖的联合目标函数,以促进变换过程并鼓励信息保留。在两个数据集上的大量实验表明,该方法在情绪转换方面表现出色,同时在保留流派和乐器配置方面远优于无需训练的基线方法,从而确立了可控嵌入变换作为个性化音乐检索的一种有前景的范式。