Real-time music information retrieval (RT-MIR) has much potential to augment the capabilities of traditional acoustic instruments. We develop RT-MIR techniques aimed at augmenting percussive fingerstyle, which blends acoustic guitar playing with guitar body percussion. We formulate several design objectives for RT-MIR systems for augmented instrument performance: (i) causal constraint, (ii) perceptually negligible action-to-sound latency, (iii) control intimacy support, (iv) synthesis control support. We present and evaluate real-time guitar body percussion recognition and embedding learning techniques based on convolutional neural networks (CNNs) and CNNs jointly trained with variational autoencoders (VAEs). We introduce a taxonomy of guitar body percussion based on hand part and location. We follow a cross-dataset evaluation approach by collecting three datasets labelled according to the taxonomy. The embedding quality of the models is assessed using KL-Divergence across distributions corresponding to different taxonomic classes. Results indicate that the networks are strong classifiers especially in a simplified 2-class recognition task, and the VAEs yield improved class separation compared to CNNs as evidenced by increased KL-Divergence across distributions. We argue that the VAE embedding quality could support control intimacy and rich interaction when the latent space's parameters are used to control an external synthesis engine. Further design challenges around generalisation to different datasets have been identified.
翻译:实时音乐信息检索(RT-MIR)在增强传统原声乐器功能方面具有巨大潜力。我们开发了针对打击指弹风格的RT-MIR技术,该风格融合了原声吉他弹奏与琴身打击技法。针对增强型乐器表演的RT-MIR系统,我们提出了若干设计目标:(i) 因果约束,(ii) 感知上可忽略的动作到声音延迟,(iii) 控制亲密性支持,(iv) 合成控制支持。我们提出并评估了基于卷积神经网络(CNN)及与变分自编码器(VAE)联合训练的CNN的实时吉他琴身打击技法识别与嵌入学习技术。我们引入了基于手部动作部位与位置的吉他琴身打击技法分类体系。遵循跨数据集评估方法,我们根据该分类体系标注并收集了三个数据集。通过计算不同分类类别对应分布间的KL散度来评估模型的嵌入质量。结果表明,网络在简化的二类识别任务中表现尤为突出,而相较于CNN,VAE能通过增大分布间KL散度实现更优的类别分离。我们论证了当潜在空间参数用于控制外部合成引擎时,VAE嵌入质量可支持控制亲密性与丰富交互。同时,识别了面向不同数据集的泛化能力等进一步设计挑战。