The Concatenator: A Bayesian Approach To Real Time Concatenative Musaicing

We present ``The Concatenator,'' a real time system for audio-guided concatenative synthesis. Similarly to Driedger et al.'s ``musaicing'' (or ``audio mosaicing'') technique, we concatenate a set number of windows within a corpus of audio to re-create the harmonic and percussive aspects of a target audio stream. Unlike Driedger's NMF-based technique, however, we instead use an explicitly Bayesian point of view, where corpus window indices are hidden states and the target audio stream is an observation. We use a particle filter to infer the best hidden corpus states in real-time. Our transition model includes a tunable parameter to control the time-continuity of corpus grains, and our observation model allows users to prioritize how quickly windows change to match the target. Because the computational complexity of the system is independent of the corpus size, our system scales to corpora that are hours long, which is an important feature in the age of vast audio data collections. Within The Concatenator module itself, composers can vary grain length, fit to target, and pitch shift in real time while reacting to the sounds they hear, enabling them to rapidly iterate ideas. To conclude our work, we evaluate our system with extensive quantitative tests of the effects of parameters, as well as a qualitative evaluation with artistic insights. Based on the quality of the results, we believe the real-time capability unlocks new avenues for musical expression and control, suitable for live performance and modular synthesis integration, which furthermore represents an essential breakthrough in concatenative synthesis technology.

翻译：本文提出“拼接器”——一种音频引导拼接式合成的实时系统。与Driedger等人提出的“音乐拼接”（或称“音频马赛克”）技术类似，我们通过拼接音频语料库中固定数量的时窗来重构目标音频流的和声与打击乐特征。然而与Driedger基于非负矩阵分解的技术不同，我们采用显式的贝叶斯框架：将语料库时窗索引视为隐状态，目标音频流作为观测值。我们使用粒子滤波器实时推断最优隐状态。系统的转移模型包含可调参数以控制语料颗粒的时间连续性，观测模型允许用户设定时窗匹配目标音频的更新速率。由于系统计算复杂度与语料库规模无关，本系统可扩展至长达数小时的语料库，这在海量音频数据时代具有重要意义。在拼接器模块内部，作曲者可根据实时听觉反馈动态调整颗粒长度、目标匹配度和音高偏移，从而实现创作理念的快速迭代。最后，我们通过参数影响的定量测试与艺术洞察的定性评估对系统进行全面验证。基于生成质量，我们认为其实时特性为音乐表达与控制开辟了新途径，既适用于现场演出与模块化合成集成，也代表了拼接式合成技术的重要突破。