Douyin Music, a large-scale platform with millions of daily users, adopts an immersive, feed-based discovery paradigm, where users passively explore music through continuous recommendations. While effective for passive music discovery, this paradigm restricts users to recommendation results and provides limited support for explicitly specifying listening intents. Unlike conventional search, where users express well-defined intents through explicit queries such as specific songs or artists, real-world active music discovery is often situational and colloquial, involving vague or underspecified requests. While LLMs enable natural language interaction, their direct use in music discovery remains limited by insufficient music-domain knowledge, lack of music-query collaborative reasoning, and shallow understanding of personalized preferences. To address these challenges, we introduce MuChator, an interactive MusicLLM-based framework that enables users to actively express situational music intents in natural language. MuChator incorporates three key components: (1) Music Knowledge Pre-training, a three-stage scheme that incrementally injects objective music knowledge, subjective music knowledge, and personalized music preferences into LLMs; (2) Context-aware Instruction Tuning, which constructs high-quality user-query-music triplets through an automated synthesis pipeline to align LLMs with active and situational user intents; and (3) Preference Alignment with Hybrid RM, which jointly models intent relevance, personalized preferences, and basic constraints, and is optimized using GRPO-based reinforcement learning. Extensive evaluations on industrial music recommendation datasets demonstrate that MuChator outperforms leading proprietary models, such as Gemini-3-Pro. The model has been deployed on Douyin Music App within ByteDance, with 46.49\% improvement of user active days in online A/B test.
翻译:抖音音乐作为一个拥有数百万日活跃用户的大型平台,采用沉浸式信息流发现范式,用户通过连续推荐被动探索音乐。尽管这种范式在被动音乐发现方面效果显著,但它将用户限制在推荐结果中,且对用户明确表达收听意图的支持有限。与传统搜索(用户通过具体歌曲或艺术家等明确查询表达清晰意图)不同,真实场景中的主动音乐发现往往是情境化和口语化的,涉及模糊或未充分指定的需求。虽然大语言模型(LLM)支持自然语言交互,但其在音乐发现中的直接应用仍受限于音乐领域知识不足、缺乏音乐与查询协同推理能力,以及对个性化偏好的浅层理解。为解决这些问题,我们提出MuChator——一种基于交互式MusicLLM的框架,使用户能够以自然语言主动表达情境化的音乐意图。MuChator包含三个关键组件:(1)音乐知识预训练——一种三阶段方案,逐步向LLM注入客观音乐知识、主观音乐知识和个性化音乐偏好;(2)上下文感知指令微调——通过自动化合成流水线构建高质量的用户-查询-音乐三元组,使LLM对齐主动且情境化的用户意图;(3)基于混合奖励模型的偏好对齐——联合建模意图相关性、个性化偏好和基本约束,并通过基于GRPO的强化学习进行优化。在工业级音乐推荐数据集上的广泛评估表明,MuChator优于Gemini-3-Pro等领先商业模型。该模型已在字节跳动旗下的抖音音乐App中部署,在线A/B测试显示用户活跃天数提升46.49%。