We introduce anticipation: a method for constructing a controllable generative model of a temporal point process (the event process) conditioned asynchronously on realizations of a second, correlated process (the control process). We achieve this by interleaving sequences of events and controls, such that controls appear following stopping times in the event sequence. This work is motivated by problems arising in the control of symbolic music generation. We focus on infilling control tasks, whereby the controls are a subset of the events themselves, and conditional generation completes a sequence of events given the fixed control events. We train anticipatory infilling models using the large and diverse Lakh MIDI music dataset. These models match the performance of autoregressive models for prompted music generation, with the additional capability to perform infilling control tasks, including accompaniment. Human evaluators report that an anticipatory model produces accompaniments with similar musicality to even music composed by humans over a 20-second clip.
翻译:我们提出了一种预见性方法:用于构建一个可控制的时序点过程(事件过程)生成模型,该方法以异步方式条件于第二个相关过程(控制过程)的实现。我们通过交错排列事件序列与控制序列来实现这一点,使得控制出现在事件序列的停止时间之后。这项工作的动机源于符号音乐生成控制中产生的问题。我们专注于填充控制任务,即控制本身是事件的一个子集,而条件生成在给定固定控制事件的情况下完成事件序列。我们使用大型且多样化的Lakh MIDI音乐数据集训练了预见性填充模型。这些模型在提示音乐生成方面的表现与自回归模型相当,同时具备执行填充控制任务(包括伴奏)的额外能力。人类评估者报告称,在20秒的片段中,预见性模型生成的伴奏在音乐性上甚至与人类创作的音乐相当。