This paper presents an approach that combines Human-In-The-Loop Reinforcement Learning (HITL RL) with principles derived from music theory to facilitate real-time generation of musical compositions. HITL RL, previously employed in diverse applications such as modelling humanoid robot mechanics and enhancing language models, harnesses human feedback to refine the training process. In this study, we develop a HILT RL framework that can leverage the constraints and principles in music theory. In particular, we propose an episodic tabular Q-learning algorithm with an epsilon-greedy exploration policy. The system generates musical tracks (compositions), continuously enhancing its quality through iterative human-in-the-loop feedback. The reward function for this process is the subjective musical taste of the user.
翻译:本文提出一种结合人机协同强化学习与音乐理论原理的方法,以促进实时音乐创作生成。人机协同强化学习先前已应用于仿人机器人力学建模和语言模型增强等多个领域,其通过人类反馈优化训练过程。本研究开发了一种能够利用音乐理论约束与原理的人机协同强化学习框架。特别地,我们提出了一种采用ε-贪婪探索策略的片段式表格Q学习算法。该系统生成音乐曲目(作品),并通过迭代式人机协同反馈持续提升其质量。该过程的奖励函数为用户的主观音乐品味。